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Handling Spelling Errors 
in Online Catalog Searches 

Karen M. Drabenstott and Marjorie S. Weller 

The purpose of this paper is to add to our understanding and knowledge of 
spelling errors in online catalog searches based on empirical studies of 
spelling errors in online catalog searches and suggest ways in which systems 
that detect such errors should handle the errors that they detect. One study 
of spelling errors in online catalog searches involved a categorization of user 
queries for subjects that were extracted from four university libraries' online 
catalog transaction logs. The results of the analysis demonstrated that less 
than ft% of user queries that match the catalog's controlled and free- text 
terms contain spelling errors. This percentage did not account for spelling 
errors in user queries that failed, to match the catalog's controlled and 
free-text terms, because of the difficulty of verifying certain terms and 
phrases rind of collection failure. The results of a related study involved user 
responses to an experimental online catalog that detected possibly mus-pelled 
words. While the majority of users corrected missfwlled (/uery words, a 
sizable proportion made an action that was even more detrimental than the 
original misspelling; for example, they added another word or phrase to the 
query in addition to the misspelled word. Thl<t paper concludes with three 
recommendations for improvements to online catalogs to assist users in the 
correction of misspelled quenj words and the detection of queries thai fail 
due to collection failure. 



Since the introduction of online catalogs 
in the early 1980s, librarians, system design- 
ers, and researchers have had a very accu- 
rate record of users' subject and known- 
item access points in the form of transaction 
logs. Dozens of researchers with varying 
intentions have studied the access points in 
these logs, especially access points that 
failed to produce retrievals. Some re- 
searchers merely described the subject and 
known-item access points that users entered 
into online catalogs, and others constructed 
rather elaborate schemes for categorizing 



access points that were successful or un- 
successful at producing retrievals. One re- 
curring problem that prevents the re- 
trieval of bibliographic records is the 
occurrence of spelling errors in online 
catalog access points. Summing up our 
knowledge about spelling errors, we know 
that users make spelling errors; such er- 
rors are not very common in online catalog 
searches, but they do result in searches 
that fail to yield retrievals; and systems can 
be programmed to detect spelling errors 
in user-entered access points. 



Karen M. Drabenstott is Associate Professor, School of Information, University oi Michigan, 
Ann Arbor (e-mail: karen.drabenstott@umich.edu); MARJORIE S. Weller is Programmer Ana- 
lyst, Medical Center Information Technologies, University of Michigan, Ann Arbor (e-mail: 
mweller@m. imap.itd.umich.edu). Manuscript received December 15, 1995; accepted for pub- 
lication February 9, 1996. 
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The purpose of this paper is to add to our 
understanding and knowledge of spelling 
errors in online catalog searches based on 
empirical studies of spelling errors in online 
catalog searches and suggest ways in which 
systems that detect such errors should han- 
dle the errors that they detect. 

Literature Review 

Many researchers have used transaction 
logs to study actual public use of com- 
puter-based retrieval systems. In his ex- 
tensive review of transaction logs, Peters 
(1993) places studies of actual puhlic use 
into nineteen different categories (e.g., 
errors, zero hits, missed opportunities, 
failures and their causes, extent of match 
studies, and user persistence). 

Definitions of what constitutes a spelling 
error vary from transaction-log study to 
transaction -log study. Spelling errors could 
include nonlegitirnate queries, such as ran- 
dom configurations, data-entry errors, ;ind 
graffiti. In eight studies (Markey 1984, 66; 
Henty $86,48; Cariyle 1989, 51; Lester 
1989, 172; Peters 1989, 270; Hunter 1991, 
400; Zink 1991, 53; and Drabenstott and 
Vizine-Goetz 1994, 161), authors reported 
that a small proportion of the terms users 
entered into online catalogs were not legiti- 
mate subject queries. These included ran- 
dom configurations (///, HJNVM) and data- 
entry errors. Such activity could have been 
exploratory, accidental (e.g., leaning on the 
keyboard), or indicative of the frustration 
users were experiencing wit h their ongoing 
search. Such activity occurs with much less 
frequency than the entry of legitimate sub- 
ject queries: for example, such activity was 
only 0.4% of the random sample of subject 
queries extracted from transaction logs at 



Some researchers consider the inclu- 
sion of punctuation in queries to be a 
spelling error (Henty 1986, 50; Walter 
1987, 78; Lester 1989, 184; Drabenstott 
and Vizine-Goetz 1994, 173). Examples of 
punctuation occurring in user queries are 
possessive forms with an apostrophe, ac- 
ronyms with periods between letters, hy- 
phenated words and phrases, and inverted 
i an intervening comma. 
I error i 



phrases without intervening spaces 
(Henty 1986, 48; Jones 1986, 6; Walter 
1987, 76). Words in the resulting phrase 
may or may not be spelled incorrectly. 

The most common types of spelling 
error involve substitution, insertion, 
transposition, or omission of one or more 
letters in words. Substitution errors result 
in the substitution of one character for 
another, e.g., "lyprosy" instead of "lep- 
rosy." Queries bearing insertion errors 
contain extra letters, e.g., "peducation" 
instead of "education." Transposition er- 
rors result in two or more characters being 
reversed, e.g., "medeival" instead of "me- 
dieval." Omission errors occur when one 
or more characters are left out of the 
word, e.g., "lanuage" instead of "lan- 
guage." Nine transaction-log studies re- 
ported these kinds of common spelling 
errors (Markey 1984, 66; Henty 1986, 48; 
Jones 1986, 4; Walter 1987, 76; Lester 
1989, 197; Peters 1989, 170; Hunter 1991, 
400; Zink 1991, 53; and Drabenstott and 
Vizine-Goetz 1994, 175). 

Researchers typically compare user 
queries with the words and phrases used 
in controlled vocabularies. Because cer- 
tain characteristics of user queries pre- 
vent them from being exact matches of 
controlled vocabulary terms, researchers 
sometimes consider such queries as mis- 
spellings. For example, misspellings could 
include user queries that are singular 
forms of plural controlled vocabulary 
terms and vice versa, e.g., "mosquito" in- 
stead of the subject heading "Mosquitos" 
(Cariyle 1989, 44), or abbreviated forms 
of words or phrases, e.g., "20th century" 
instead of the subject heading "Twentieth 
century" (Henty 1986, 48; Walter 1987, 
76; Cariyle 1989, 44). 

Although user-assisted spelling-detec- 
tion and -correction algorithms are com- 
monplace in todays word-processing pro- 
grams, such capabilities are not standard 
in online catalogs. An early catalog — 
BACS at Washington University — fea- 
tured Soundex for spelling correction. 
Several versions of the experimental 
Okapi online catalog have featured user- 
assisted spelling detection and correction. 
Walker and Jones (1987, 76-77, 151) com- 
pared two versions of Okapi that featured 



LRTS • 40(2) • Handling Spelling Errors /115 



two slightly different user-assisted spell- 
ing-detection and -correction algorithms. 
In one version, possible misspellings were 
detected using a Soundex algorithm, users 
were informed of the possibly misspelled 
word, and one word was suggested as a 
replacement; users also were given the 
option to enter a new, different, or cor- 
rected query. The second version was the 
same as the first except that users were 
prompted to enter a new, different, or 
corrected word for the possibly mis- 
spelled word detected by Okapi. The re- 
searchers concluded that the former sys- 
tem handled 78% of cases well compared 
to the 64% of cases that the latter system 
handled well. 

Misspellings in User Queries 

Misspellings in user queries — the focus of 
this paper — came from data sets generated 
in two separate but related sponsored-re- 
search projects. The first project — titled 
"Enhancing a New Subject Access Design 
to Online Catalogs" — was supported by 
the OCLC Online Computer Library Cen- 
ter, Inc., Library and Information Science 
Research Grant Program (Drabenstott 
1994). We obtained transaction logs from 
the online catalogs of Syracuse University, 
the University of California, Los Angeles 
(UCLA), the University of Kentucky, and 
the University of Michigan, extracted a 
total of about two thousand user queries 
for subjects from the logs, and performed 
a manual analysis of these queries. The 
manual analysis required us to categorize 
user queries according to the types of ele- 
ments present in them (i.e., topical sub- 
jects, corporate names, geographic names, 
personal names, and combinations of two 
or more elements), develop subcategories 
of queries corresponding to the extent to 
which they matched subject headings and 
other subject-rich terms in bibliographic 
records, and identify queries that were nei- 
ther matches of subject headings nor other 
subject-rich terms in bibliographic rec- 
ords. The results of the manual analysis 
demonstrated the extent to which users en- 
tered subject queries bearing misspellings 
into online catalogs. 

The second project — titled "Testing a 



New Subject Access Design to Online 
Catalogs" — was supported by the Depart- 
ment of Educations College Library 
Technology and Cooperation Grants 
(Drabenstott and Weller 1995). The pur- 
pose of this research project was to test a 
new subject-access design. This design 
featured an online catalog that had a wide 
range of subject-searching capabilities 
and search trees to govern the system's 
selection of searching capabilities in re- 
sponse to user queries. The system asked 
users to differentiate between subject 
queries bearing personal names and all 
other subject queries. On their own, the 
search trees then determined the extent 
to which user queries matched subject 
headings and other subject-rich terms in 
bibliographic records. This machine- 
based analysis resulted in the selection of 
a subject-searching approach that was 
likely to produce useful retrievals in re- 
sponse to user queries. Failure to effect a 
match between queries and the catalog's 
vocabulary sometimes meant that the 
query word or words were misspelled. 
The experimental online catalog reported 
such queries to users and asked them to 
check their queries for possible spelling 
errors. The results of this interaction be- 
tween system and users demonstrated 
how users would respond to an online 
catalog that assisted them in detecting 
misspelled queries. 

The research questions addressed in 
this paper are: (1) How prevalent are mis- 
spellings in user queries for subjects? and 
(2) How do users respond to online cata- 
logs that detect possible spelling errors in 
their queries for subjects? 

For the analyses described in this pa- 
per we considered the following to be 
spelling errors: (1) substitution, (2) inser- 
tion, (3) transposition, (4) omission of one 
or more letters or spaces in words, and (5) 
run-on words missing one or more spaces. 

Prevalence of Spelling Errors 
in User Queries 

Categorizing Subject Queries 
Extracted from Transaction Logs 

A research team at the University of 
Michigan selected the initial queries users 
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entered in subject searches from the four 
libraries' transaction logs. We chose initial 
queries because subsequent queries 
might have been unnecessary if catalogs 
responded to initial queries with useful 
retrievals. Queries were selected from on- 
line catalog terminals searched exclusively 
by library patrons. 

We categorized queries by the type(s) 
of elements present in them: («) topical 
subjects, (b) corporate names, (c) geo- 
graphic names, (d) personal names, and 
(e) combinations of two or more elements 
(a through d). We then subcategorized 
categorized queries using the same series 
of decisions that an online catalog that was 
programmed with search trees would 
make. 

Search trees hold much promise for 
assuming the burden of determining 
which subject-searching approach is likely 
to produce useful information for user 
queries. The designers of the Okapi ex- 
perimental online catalog first defined 
search trees as "a set of paths with 
branches or choices, which enables the 
system to carry out the most sensible 
search function at each stage of the 
search" (Mitev, Venner, and Walker f 985, 
94). The search trees they implemented 
in Okapi evolved through a process of 
discussion and trial and error and placed 
more emphasis on searching the titles 
than tiie subject headings in Okapi 's cata- 
log records because only half of these re- 
cords contained subject headings (Mitev, 
Venner, and Walker 1.985). 

Some operational online catalogs have 
subject-searching routines that resemble 
search trees. For example, the online cata- 
log of the University of Illinois at Urbana- 
Champaign responds to user queries for 
subjects with keyword searches of as- 
signed subject headings. When users ter- 
minate searches, the system prompts 
them to continue and gives the results of 
a title-keyword search (Hildreth 1989). 
The Illinois online catalog always per- 
forms keyword searches of subject-head- 
ing fields before title-keyword searches 
because the former consumes fewer sys- 
tem resources than the latter. 

The search trees that we used to sub- 
categorize categorized queries were the 



result of the empirical study of the subject 
terms users entered into online catalogs 
(Drabenstott and Vizine-Goetz 1994). 
The empirical study demonstrated that 
the subject terms users entered into on- 
line systems possessed certain charac- 
teristics that revealed the subject-search- 
ing approaches most likely to succeed in 
producing assigned subject headings and 
bibliographic records on the topics users 
seek. Examples of such characteristics 
were the number of words in user queries, 
the extent to which user queries matched 
controlled vocabulary terms, and th su- 
ability to produce retrievals in response to 
certain subject-searching approaches. 

Search-tree Subcategories 

Drabenstott and Vizine-Goetz (1994) dis- 
cussed search trees in depth and provided 
flowcharts depicting search-tree decision 
points; thus, only a brief description of 
search-tree categories is given here. The 
first step was to segregate user queries 
containing personal names from user que- 
ries that did not contain personal names. 
The former queries were subjected to 
analyses that were different from the 
analyses performed on the latter queries. 
These latter queries were candidates for 
the exact-approach subcategory. To be 
placed in this category, user queries were 
compared with subject headings printed 
in the Library of Congress Subject Head- 
ings (LCSR) and with subject headings 
that could he formulated using subject 
headings printed in LCSH and subdivided 
by geographic, topical, and period subdi- 
visions. On occasion, some manipulation 
would be necessary to effect an exact match. 
For example, matches were effected by the 
following: (a) ignoring capitalization, (b) re- 
moving punctuation, (c) removing stop 
words, (d) normalizing word order, (e) ig- 
noring spelling, and (f) combinations of 
categories (a) through (e). In the event an 
exact match was made, no additional analy- 
sis of the query was done. 

Queries were then given to a search 
tree that favored the alphabetical-ap- 
proach subcategory. If queries matched a 
longer unsubdivided subject heading, 
they met the criteria for placement in the 
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alphabetical-approach subcategory. On 
occasion, some manipulation would be 
necessary to eff ect an alphabetical match. 
For example, matches were effected by 
the following: (a) ignoring capitalization, 
(b) removing punctuation, (c) ignoring 
spelling, and (d) combinations of catego- 
ries (a) through (c). In the event an alpha- 
betical match was made, no additional 
analysis of the query was done. 

Remaining one-word queries were 
studied to determine whether title-key- 
word searches in operational online cata- 
logs would produce retrievals. If title-key- 
word searches failed to produce retrievals, 
the query was probably misspelled or the 
result of collection failure. 

Queries composed of two or more 
words that did not meet criteria for the 
exact approach or alphabetical approach 
remained. We performed two general 
types of keyword searches in operational 
online catalogs to produce retrievals: (1) 
keyword-in-heading searches through 
keyword-in-main-heading searches and 
keyword-i n - s u b divide d- heading searches 
and (2) keyword searches through title - 
keyword searches, keyword-in-subject- 
heading-fields searches, and keyword-in- 
record searches. The order of keyword 
searches was important. Retrievals pro- 
duced through the first few approaches 
should have been more precise than re- 
trievals produced using the last two ap- 
proaches because a single field (subject 
headings or titles) was searched. 

Queries bearing personal-name ele- 
ments were submitted to search trees re- 
quiring a different set of decisions. We 
began by differentiating personal-name 
queries bearing topical and other types of 
elements from queries bearing only name 
elements. We then tried to effect matches 
of the former with words in subdivided 
subject headings. That is, we tried to 
effect matches using the keyword-in-sub- 
divided-heading search in operational on- 
line catalogs. If this failed, we performed 
keyword-in-record searches for all ele- 
ments in user queries. If this also failed to 
produce retrievals, we ignored all ele- 
ments except the personal-name ele- 
ments in queries and used one or more 
remaining personal-name elements in 



user queries to effect a match using the 
alphabetical approach. 

Basic subcategories of user queries 
were the following: 

• Exact matches (excluding queries 
with personal-name elements) 

• Alphabetical matches (all queries) 

• Keyword-in-heading matches (all 
queries) 

• Keyword-in-record matches (all 
queries) 

• Nonmatches (none of the above four 
subcategories, excluding queries with 
personal-name elements) 

Categorized Initial Access Points 

A total of 1,919 initial access points in 
subject searches were extracted from the 
transaction logs of online catalogs at 
Syracuse University (571 access points), 
UCLA (511 access points), University of 
Kentucky (418 access points), and the 
University of Michigan (419 access 
points). The total percentages of types of 
initial access points across all four libraries 
are summarized in figure 1 . 

Overall, about 3 of every 5 queries con- 
tained only topical elements. Personal 
names accounted for 11% of user queries. 
The most frequent multielement query con- 
tained topical and geographic elements and 
represented about 8% of user queries for 
subjects. Nonlegitimate queries numbered 
203; these were expletives, gibberish, ex- 
plicit sex terms, known-item searches, and 
accounted lor 10% of user queries. When 
nonlegitimate queries were discarded from 
subsequent analyses, a total of 1 ,716 subject 
queries were analyzed. 

At all four data-collection sites, the ma- 
jority of user queries for subjects were 
topical subjects. Users searching the on- 
line catalogs at Syracuse, Kentucky, and 
Michigan entered large percentages of 
subject queries for personal names. Sub- 
ject queries for personal names that 
UCLA users entered were actually en- 
tered incorrectly. ULCA required users to 
use the systems Find Name or Browse 
Name commands to search personal- 
name queries rather than its Browse Sub- 
ject command. Users searching online 
catalogs at UCLA and Michigan entered 
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large percentages of sub ject queries bear- 
ing both topical and geographic-name ele- 
ments. Multiple-element queries — that 
is, queries bearing topical elements and 
one or more other element types — repre- 
sented between 2% and 17% of user que- 
ries for subjects. 



subcategorized initial 
Access Points 

Exact Matches 

Our analysis of user queries for subject 
generally began with a test to determine 
whether they were exact matches of con- 



trolled vocabulary terms. Of the total of 
1,716 legitimate subject queries, 832 que- 
ries (48.5%) met the criteria for exact 
matches (see table 1). 

Of the 832 exact matches, most (653, 
or 78.5%) were queries for topical sub- 
jects. Spelling errors occurred in queries 
that were exact matches more frequently 
than in queries that were normalized 
matches. Spelling errors occurred in all 
types of categorized queries — queries 
bearing topical elements, geographic- 
name elements, corporate-name ele- 
ments, and a combination of topical and 
geographic-name elements. Overall, spell- 
ing errors occurred in 5.8% of exact 



TABLE 1 
Types of Exact Matches 



Type of Exact Match 


Total 

(JV=832) 


Topical 
(N=653) 


Geographic 
(N=79) 


Corporate 
(N=40) 


Topical- 
Geographic 
(]V=60) 


Exact 


62.9 


67.7 


65.8 


17.5 


36.7 


Exact, spelling error 


4.3 


4.1 


8.9 


5.0 


0.0 


Exact, reference 


13.0 


13 3 


2.5 


37.5 


6.7 


Exact, spelling error, reference 


0.8 


0.5 


0.0 


10.0 


0.0 


Normalized 


16.0 


12.1 


20.3 


25.0 


46.6 


Normalized, spelling error 


0.7 


0.5 


0.0 


2.5 


3.3 


Normalized, reference 


2.3 


1.8 


2.5 


2.5 


6.7 


Total 


100 


100.0 


100.0 


100,0 


100.0 
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TABLE 2 

Examples of Misspelled 
Exact Matches 

Matching Subject 



User Query Heading or Reference 



catholic churchu 


Catholic church 


viet nam 


Vietnam 


psibocylin 


Psilocybin 


guadalupe 


Guadaloupe 


austrailia 


Australia 


austri alalia 


Australia 


Syracuse univeristy 


Syracuse University 


phptpgraphy 


Photography 


3therapy 


Therapy, see 


Therapeutics 



matches. Examples of misspelled user 
queries that were exact matches of subject 
headings or ref erences are listed in table 
2. 

Alphabetical Matches 

Remaining queries were tested to deter- 
mine whether they were alphabetical 
matches of controlled vocabulary terms. 
Of the total of 1,716 legitimate queries, 



155 queries (9%) met the criteria for al- 
phabetical matches (see table 3). 

Except for queries bearing topical ele- 
ments only, alphabetical matches were 
pretty rare. Spelling errors occurred in 
three of the four types of categorized que- 
ries — queries bearing topical elements, 
geographic-name elements, and a combina- 
tion of topical and geographic-name ele- 
ment";. Overall, spelling errors occurred in 
5.2% of alphabetical matches. Examples of 
misspelled user queries that were alpha- 
betical matches of subject headings or ref- 
erences are listed in table 4. 

Keyword-in-Heading Matches 

Remaining queries were tested to deter- 
mine whether they were keyword-in-head- 
ing matches of controlled vocabulary terms. 
Of the total of 1,716 legitimate queries, 98 
queries (5.7%) met the criteria for keyword- 
in-heading matches (see table 5). 

User queries that were keyword-in- 
heading matches were divided between 
queries bearing topical elements only and 
queries bearing a combination of topical 
and geographic elements. The majority of 

about one-quarter of matches were 



TABLE 3 
Types of Alphabetical Matches 



Type of Alphabetical Match 


Total 
(N=155) 


Topical 
(N=130) 


Geographic 
(N=7) 


Corporate 
(N=10) 


Topical- 
Geographic 
(N=8) 


Two or more words in heading 


20.0 


16 2 


14.3 


50.0 


50.0 


Two or more words in reference 


4.5 


4.6 


0.0 


10.0 


0.0 


Two or more words in 
reference, spelling error(s) 


1.9 


1.5 


0.0 


0.0 


12.5 


One word in heading 


23.2 


21.5 


42.9 


40.0 


12.5 


One word in reference 


23.9 


26.2 


14.3 


0.0 


25.0 


One word in heading, spelling 
error(s) 


2.6 


1.5 


28.5 


0.0 


0.0 


Less than one word in heading 


18.7 


22.3 


0.0 


0.0 


0.0 


Less than one word in 
reference 


4.5 


5.4 


0.0 


0.0 


0.0 


Less than one word in heading, 
spelling error(s) 


0.7 


0.8 


0.0 


0.0 


0.0 


Total 


100.0 


100.0 


100.0 


100.0 


100.0 



120/ LRTS • 40(2) • Drabenstott and Weller 



TABLE 4 

Examples of Misspelled Alphabetical Matches 



Queiy 


Matching Heading 
or Reference 


Alphabetical-Match Type 


carribean 


Caribbean literature (French) 


One word in heading, spelling error 


chernoyble 


Chernobyl Nuclear Accident, 
Chernobyl, Ukraine 


One word in heading, spelling error 


crcreative 


Creative ability 


One word in heading, spelling error 


oorrientalisra 


Orientalism in art 


One word in reference, spelling error 



TABLE 5 

Types of Keyword-in-Heading Matches 



Type of Keyword-in- 
Heading Match 


Total 
(Af=9S) 


Topical 
(JV=53) 


Geographic 
(JV=3) 


Corporate 
(N=0) 


Topical- 
Geographic 
(N=42) 


Main heading 


24.5 


34.0 


0.0 





14.3 


Subdivided heading 


66.3 


62.2 


100.0 





69.0 


Subdivided heading, 
spelling error 


9.2 


3.8 


0.0 





16.7 


Total 


100,0 


100.0 


100.0 





100.0 



matches of main, unsubdivided subject 
headings. A little under 10% of keyword- 
in-heading matches involved spelling er- 
rors, and all were connected with matches 
oi subdivided headings. Examples of mis- 
spelled user queries that matched sub- 
divided subject headings are "south africa 
and te church" "2slave religion," and "pre- 
school testingh." 

Keyword Matches 

Remaining queries were tested to deter- 
mine whether they were keyword matches 
of controlled vocabulary terms. Of the to- 
tal of 1,716 legitimate queries, 290 que- 
ries (16.9%) met the criteria for keyword 

Over 85% of keyword matches were 
title matches. Such matches would pro- 
voke systems governed by search trees to 
respond to matches with title-keyword 
searches. Less than 4% of keyword 
matches would provoke systems to re- 
spond with keyword searches of suhject 
heading fields. A little over 10% of key- 
word matches would result in keyword-in- 



record searches. Spelling errors occurred 
in 5.5% of queries that were keyword 
matches. Examples of such 
queries were "dev 




"critical sucsess factors," and "tetro 



Nonmatches 

A set of 90 queries lor topical subjects 
remained. These queries tailed to meet 
the criteria for exact, alphabetical, key- 
word-in-heading, avid keyword matches. 
Of these 90 queries for subjects generally, 
71 queries hore topical elements only 8 
bore corporate-name elements only, and 
11 hore a combination of topical and geo- 
graphic elements. The reasons these que- 
ries were not matches of subject headings, 
words in titles, or words in other subject- 
rich fields of bibliographic records some- 
times involved spelling. For example, the 
following nonmatching queries contained 
spelling errors: 

• flamability standards 

• black playwrites 

• federalism and jefersonianism 
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Types of Keyword Matches 


Type of 

Keyword Match 


Total 
(N=290) 


Topical 
(JV=249) 


Geographic 
(N=3) 


Corporate 
(W=D 


lopicai- 
Geographic 
(JV=37) 


Two or more title words 


73 5 


73.9 


100.0 


100.0 


67.6 


Two or more title 
words, spelling error 


A 1 

4.1 


A C 

4.o 


ft ft 


ft ft 
W.U 


ft ft 
u.u 


One title word 




ft ft 
OA) 


ft ft 

u.u 


ft ft 
U,W 


O 7 


One title word, 
spelling error 


0.7 


0.8 


0.0 


0.0 


0.0 


Words in subject 
heading f ields 


3.8 


2.8 


0.0 


0.0 


10.8 


Words in subject- 
rich fields 


10.0 


8 9 


0.0 


0.0 


18.9 


Words in subject-rich 
fields, spelling error 


0.7 


0.8 


0.0 


0.0 


0.0 


Total 


100.0 


100.0 


100.0 


100.0 


100.0 



• rome oly 

• transvesticism 

• severly handicapped students 

If some of these queries (e.g., "rome 
oly," "severly handicapped students," 
"transvesticism") were submitted to trun- 
cation, they might produce retrievals be- 
cause truncation would be forgiving about 
spelling errors — that is, truncation would 
eliminate the misspelling from the word 
and a correctly spelled stem would re- 
main. Other queries (e.g., "flamability 
standards," "black playwrites," "federal- 
ism and jefersonianism") might require 
both truncation and spelling correction to 
produce retrievals. The latter queries 
might not result in retrievals because they 
are too specific. Perhaps subject searches 
of journal article abstracts, back-of-the- 
book indexes, or tables of contents might 
produce retrievals, but these literary ele- 
ments are seldom indexed in online cata- 
log databases. 

It was difficult for us to quantify spell- 
ing errors in nonmatching queries be- 
cause we were unable to verily certain 
terms and phrases. Examples were "nean- 
der valley," "rayonnant architecture," 
"gnatting," "race g," "psychosis icu," and 
"cremastogaster pilosa." Much reference 
work and discussion with colleagues en- 



abled us to verify the query "l'arche" — an 
international organization that assists 
mentally challenged adults — long alter 
the project was completed. 

Many nonmatching queries were 
spelled correctly. Examples are: 

• smoking woman 

• keystone corporation 

• french occupation in chad 

• luria-nebraska assessment battery 

• toshiba affair 

A combination of a number of tech- 
niques (e.g., truncation, matches on fewer 
than all words in queries) would probably 
result in matches that would lead to re- 
trievals. As a last resort, searches of jour- 
nal article abstracts, back-of-the-book in- 
dexes, or tables of contents might produce 
retrievals because the subjects repre- 
sented by these queries are too specific to 
be treated in full-length books, mono- 
graphs, and journal titles. 

Spelling Errors and Matches 

Spelling errors in 5.9% of 1,375 queries 
for subjects generally prevented exact, al- 
phabetical, keyword-in-heading, and key- 
word matches. Spelling errors were not 
pervasive in a particular match type. Gen- 
erally, spelling errors occurred in between 
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5% and 6% of exact, alphabetical, key- 
word-in-heading, and keyword matches. 

Although spelling errors in an addi- 
tional 90 queries for subjects generally 
occurred in only a fraction of these que- 
ries, systems would have a difficult time 
finding them on their own because re- 
trievals would be possible only after spell- 
ing was corrected and the system per- 
formed a second matching technique 
(e.g., truncation, searching for fewer than 
all words in queries). Spelling errors 
would also be difficult to detect because 
the system's failure to produce retrievals 
might be due to collection failure. 
Searches that are more comprehensive or 
larger than library cataloging databases 
might provide more detail and greater 
depth about an item's subject matter, e.g., 
tables of contents, back-of-the-book in- 
dexes, or full texts. 

Subject Queries 
for Personal Names 

Search trees for subject queries for per- 
sonal names consider user queries bearing 
personat-name and topical elements as 
eandi dates fbrkeyword-in-heading matches 
or keyword matches. Failure to produce 
matches for these two keyword matches 
would result in the omission of topical 
elements and the submission of name ele- 
ments only to alphabetical matching. Of 
the total of 251 queries that contained 
personal- name elements, only 32 queries 
contained a combination of personal - 
name and topical elements; thus, these 
queries were submitted to keyword-in- 
heading matches or keyword matches. 
Three queries contained spelling errors 
that prevented keyword matches, viz. 
"skinner and sibling^," "delacroix and 
coir," and "clarence darrow's relegious 
views." Omitting tile misspelled topical 
elements from each query left one or 
more personal-name elements that could 
be used by the alphabetical approach to 
find the appropriate location in an alpha- 
betical listof personal-narne subject head- 
ings where the personal -name elements 
of the query might be listed. 

Search trees would treat the remaining 
219 names bearing personal-name ele- 



ments only by submitting them to alpha- 
betical matching — that is, using the name 
elements to find the appropriate location 
in an alphabetical list of personal-name 
subject headings where the personal- 
name elements of the query might be 
listed. 

Quantifying spelling errors in subject 
queries for personal names was irrelevant 
for two reasons. First, computer-based re- 
trieval systems can perform matching 
techniques that forgive spelling errors. 
Second, some queries bore names that 
were impossible for us to verify, so we did 
not know whether such queries contained 
misspellings or named individuals for 
whom no monographic literature was 
available. When queries for personal 
names were misspelled, users might have 
found the desired personal-name heading 
rather quickly because the spelling error 
was toward the end of the name or the 
name stem was rather unique and there 
were likely to be few names beginning 
with the stem. Examples are the following 
misspelled queries for personal names: 

• bosc hieronymus 

• shakespear 

• philoctetess 

• aphropdite 

• nanbnancy holt 

Users who entered queries for the fol- 
lowing misspelled names into systems that 
responded with an alphabetical list of per- 
sonal-name subject headings would prob- 
ably have to browse many, many lists to 
find the desired name quickly because the 
spelling error was at or toward the begin- 
ning of the name or the basic word stem 
was not unique and there were likely to be 
many names beginning with the stem: 

• hitckcock 

• farakan louis 

• daili 

• magrite rene 

• lfeinbloom deb 

• hyppocrates 

• nmarlow philip 

We were unable to verify a total of 
fourteen names. Examples are: 

• prosser waiter lee 

• steinway henry 

• n schribner richard 

• abrahams r d 
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• klagsburn 

Search trees for personal-name que- 
ries are f orgiving about spelling errors be- 
cause they always respond with an alpha- 
betical list of personal-name subject 
headings in the alphabetical neighbor- 
hood of user-entered, personal-name ele- 
ments of queries. User perseverance and 
the uniqueness of personal-name ele- 
ments in queries vis-a-vis personal-name 
subject headings in the alphabetical 
neighborhood of the desired name have a 
large stake in the outcome — that is, 
whether users find the desired names. 

Using an Experimental Online 
Catalog to Detect Possible 
Misspellings 

For our second research question we re- 
port on the results of user responses to an 
experimental online catalog that detected 
possible misspellings in user queries for 
subjects. The experimental online catalog 
was developed in a research project titled 
"Testing a New Subject Access Design to 
Online Catalogs." This design featured an 
online catalog that had a wide range of 
subject-searching capabilities and search 
trees to govern the system's selection of 
searching capabilities in response to user 
queries. The search trees utilized match- 
ing techniques to determine the extent to 
which user queries matched subject head- 
ings and other subject-rich terms in bibli- 
ographic records. These techniques were 
the same as the techniques used in the 
matching study that is the subject of the 
first half of this paper. That is, the system 
asked users to differentiate their queries 
for subjects generally from their queries 
for personal names. Search trees then 
sought exact, alphabetical, keyword-in- 
heading, and keyword matches of subject 
headings or subject-rich fields of bibliog- 
raphic records and responded with sub- 
ject-searching approaches corresponding 
to the types of matches made. Search 
trees chose subject-searching approaches 
that were likely to produce usef ul retriev- 
als in response to user queries; thus, they 
favored controlled vocabulary over free- 
text searching approaches. Failure to ef- 
fect a match between queries and the 



catalog's vocabulary sometimes meant 
that the query word or words were mis- 
spelled. The experimental online catalog 
reported such queries to users and asked 
them to check their queries for possible 
spelling errors. The results of this interac- 
tion between system and users demon- 
strated how users would respond to an 
online catalog that assisted them in de- 
tecting misspelled queries and, thus, an- 
swered our second research question, viz. 
"How do users respond to online catalogs 
that detect possible spelling errors in their 
queries for subjects?" 

Experimental Online Catalog 
Development 

The experimental online catalog named 
ASTUTE (A Search Tree Underlying The 
Experiment) was developed by a project 
team at the University of Michigan to test 
the new subject-access design. The team 
programmed ASTUTE on a stand-alone 
Gateway 2000 486, 33 MHz, IBM-com- 
patible microcomputer, with 8 megabytes 
of RAM and a VGA color monitor. The 
operating system was MS-DOS version 
5.0. A dot-matrix printer and a mouse 
were attached to the microcomputer for 
use by ASTUTE project staff during de- 
velopment work and end users during on- 
line retrieval tests. 

The databases of the ASTUTE experi- 
mental online catalog were created from 
two data sources: (1) Machine-Readable 
Cataloging (MARC) records for biblio- 
graphic data from the two participating 
libraries in selected subject areas of the 
Library of Congress Classification (LCC) 
and (2) MARC records for subject- 
authority data from the compact disc- 
based product CD/MARC Subjects dis- 
tributed by the Library of Congress. The 
number and subject areas of MARC bib- 
liographic records were: 

1. Mardigian Library of the University 
of Michigan-Dearborn: 14,686 bibli- 
ographic records in Computer Sci- 
ence (QA76) and Technology (T-TX) 

2. Lilly Library of Earlham College: 
11,976 bibliographic records in 
American History (E1-F1199) 

The ASTUTE project team did not 
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combine bibliographic records into a sin- 
gle database. Rather, the team used the 
two libraries' bibliographic records to cre- 
ate separate, searchable databases, one on 
computer science and technology for the 
University of Michigan-Dearborn (UM- 
D), and one on American history for 
Earlham College. 

Subject Searching in the 
Experimental Online Catalog 

We tested the retrieval effectiveness of 
the experimental online catalog with 
search trees by comparing its perform- 
ance with the performance of an experi- 
mental online catalog in which subject- 
searching approaches were assigned at 
random. To accomplish this, we designed 
the ASTUTE experimental online catalog 
to feature two online catalogs: (1) the Blue 
System, in which search trees governed 
the system's selection of a subject-search- 
ing capability, and (2) the Pinstripe Sys- 
tem, in which the system selected a sub- 
ject-searching capability randomly. These 
systems were purposely designed to be 
very much alike to focus the attention of 
library patrons and staff on the retrieval of 
useful information in response to their 
queries. The Blue and Pinstripe Systems 
had virtually the same interfaces, and they 
accessed the same bibliographic and 
authority databases. Except for the Blue 
System's enhancement with the search 
trees, the two systems and their capabili- 
ties were the same. 

Search trees exemplified the searching 
strategies used by expert search interme- 
diaries. Intermediaries use controlled vo- 
cabulary because it yields relevant output. 
When controlled vocabulary is not avail- 
able to express user queries, intermediar- 
ies conduct free-text searches of titles and 
abstracts to retrieve a few relevant re- 
cords, review results to find relevant con- 
trolled vocabulary, and then incorporate 
such vocabulary into the ongoing search. 
The search trees performed in a similar 
manner. They invoked searching ap- 
proaches that looked for matches of user 
queries in subject-heading fields of cata- 
loging records before enlisting keyword- 
search approaches that looked for 



matches in title fields or in a combination 
of title and subject-heading fields. Like 
the matching studies in the first half of this 
paper, search trees for queries for subjects 
generally effected exact matches, alpha- 
betical matches, and keyword-in-heading 
matches, that is, matches of controlled 
vocabulary terms, before effecting key- 
word matches, that is, matches of free-text 
words and phrases in bibliographic re- 
cords. Also, search trees for subject que- 
ries for personal names effected keyword- 
in-heading matches of name and topical 
elements in user queries before ignoring 
topica! elements and displaying an alpha- 
betical browsing list of personal -name 
subject headings in the alphabetical 
neighborhood of personal -name elements 
in user queries. Thus, decisions that the 
search trees made about responding to 
user queries with matches of subject 
headings and words in bibliographic re- 
cords were very similar to the decisions 
that judges made about matching user 
queries in the matching study described 
in the first half of this paper. 

Detection of Possibly Misspelled 
Words in User Queries 

The ASTUTE experimental online cata- 
log did not feature automatic spelling cor- 
rection. It did, however, inform users of 
query elements that failed to produce re- 
trievals and suggest users check such ele- 
ments for spelling errors. 

The Blue System checked to deter- 
mine whether each word in user queries 
for subjects generally was posted in its 
database following the system's failure to 
make matches through the exact and al- 
phabetical approaches. It checked words 
in queries from left to right. An example 
is the user query "noegro peospirity in late 
1920s. "The system failed to find the word 
"noegro," informed the user of its failure, 
and suggested the user check spelling (see 
figure 2). The user corrected this word 
and submitted a new query to the sys- 
tem — "negro peospirity in late 1920s." 
The system failed to find the word "peos- 
pirity," informed the user of its failure, 
and suggested the user check spelling (see 
figure 3). After responding to several sys- 
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Figure 2. Informing Users of Possible Misspellings 



tem prompts to check spelling, the user 
eventually entered a query that contained 
no spelling errors, viz. "negro prosperity 
in late 1920s." 

When the Pinstripe System's random- 
selection algorithm called for the alpha- 
betical approach, the system made no at- 
tempt to find possible spelling errors. 
When it called for keyword-in-subdi- 
vided-heading or keyword searches, the 
system performed the same error-check- 
ing routine as the Blue System. That is, it 
checked a keyword index to determine 
whether the individual words in queries 
were used in the database. It checked 
the words in queries from left to right, 
informed users of query elements that 
failed to produce retrievals, and sug- 
gested users check such elements for 
spelling errors. 

Queries that the Blue and Pinstripe 
Systems identified as having possible 
spelling errors would be considered non- 
matches in the matching study that is the 
focus of the first half of this paper. This 
did not always mean that query words 
were misspelled. Automatic truncation or 



matches on fewer than every word in que- 
ries or a combination of these two tech- 
niques could have resulted in matches 
and, thus, retrievals. Also, collection fail- 
ure could be the reason why ASTUTE 
failed to produce retrievals for words in 
user queries. 

Administering Comparison Search 
Experiments in Lirraries 

The ASTUTE project team transported 
the Gateway microcomputer bearing 
ASTUTE to the two data-collection 
sites — Mardigian Library at the Univer- 
sity of Michigan-Dearborn and Lilly Li- 
brary at Earlham College. The microcom- 
puter was dedicated to use of the 
ASTUTE experimental online catalog. At 
UM-D, ASTUTE was located in a quiet 
study area of the library that was also near 
the computer science, engineering, and 
technology stacks. Thus, ASTUTE 
searchers would not have to go very far to 
access the library material they retrieved 
in their searches of the experimental 
online catalog. At Earlham College, 
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ASTUTE was located in the reference 
area of the library near the library's 
MARCIVE CD-ROM-based online cata- 
log and other CD-ROM reference 
sources. Lilly Library reference staf f were 
also nearby and directed patrons to AS- 
TUTE when they felt patrons would find 
useful material in the system. At both li- 
braries, signs were placed near ASTUTE 
to attract library patrons to use the system. 

The ASTUTE experimental online 
catalog performed recruiting functions on 
its own. Introductory screens invited us- 
ers to participate in the experiment; told 
users how to operate the keyboard and 
mouse, make selections, and print 
screens; and asked them to conduct a 
computer-based search on a topic of their 
own choosing in the system. ASTUTE 
told users it was logging their searches, 
relevance assessments to displayed ti- 
tles, and responses to questions. Lihriiry 
users were entirely on their own to read 
screens, conduct searches, and answer 
questions^ 

The data-collection period at UM-D 
lasted five weeks, from March 12 to April 
19, 1993. ASTUTE administered a total of 



826 Comparison Search Experiments. At 
Earlham College, data collection lasted 
thirteen weeks, from February 23 to May 
28, 1993. ASTUTE administered a total of 
238 Comparison Search Experiments. 
Thirty-three of the total 1,064 search ad- 
ministrations involved library staff at the 
two participating libraries. 

Interviewers were not present to 
monitor system use; consequently, we ex- 
pected searches for topics that were not 
represented in the experimental online 
catalog. We also expected searchers to 
leave the experiment without completing 
the full search administration. To deter- 
mine usable search administrations for 
submission to data analyses, the ASTUTE 
project team had to manually review 
searches and queries. Of the 1,064 search 
administrations, about half (528 of 1,064 
administrations) were usable. About a 
third (34%) were unusable queries that 
were entered into the experimental online 
catalog's subject-searching capabilities for 
subjects generally. About three-quarters 
of these unusable subject queries were 
out-of-scope, that is, the bibliographic- 
record databases did not contain titles for 
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the requested topics. Most other unusable 
subject queries were characterized as 
playing or meaningless input, e.g., sex 
terms, expletives, blanks, one or more of 
the same letters, gibberish. About one- 
eighth (12%) were unusable queries for 
personal names. Some unusable queries 
for personal names were names that were 
out-of-scope, others were elements of 
known-item searches, and others were 
playing or meaningless input. Less than 
5% of unusable searches were search ad- 
ministrations in which users completed 
one or more presearch questions, but they 
did not continue with their searches. 
These users probably walked away from 
the system, and it eventually reset itself to 
the introductory screen savers. 

A large percentage (43%) of usable 
administrations of the Comparison Search 
Experiment were full administrations. Of 
the four partial-administration categories, 
the largest percentage (29%) contained 
the three complete events; unfortunately, 
users walked away before completing the 
postsearch questionnaire. 

Details about individual search ad- 
ministrations are given in the final report 
of the project (Drabenstott and Weller 
1995). Our focus in this paper is on users' 



responses to the experimental online cata- 
log's suggestions that their queries might 
be misspelled. 

User Responses to Astute's 
Suggestion of Misspelled 
Words in User Queries 

The experimental online catalogs re- 
sponded to 134 queries with the messages 
in figures 2 and 3 that informed users that 
their queries contained possible spelling 
errors. In table 7 we describe what users 
did next. 

Large numbers of users entered que- 
ries on different topics. Examples of que- 
ries bearing unposted words and the que- 
ries users entered following the system's 
message informing them of a possible 
spelling error are listed in table 8. Words 
in italics were the unposted words that the 
experimental online catalog displayed to 
users for their correction. 

Following the system's message in- 
forming them of a possible spelling error, 
large numbers of users entered the same 
query one or more times. Examples of 
such queries are "internet," "hovercraft," 
"androids," "barcode," "reinforce con- 
crete," and "nanotechnology." Perhaps 



TABLE 7 

User Actions Following System Message Regarding 



Possible Spelling Errors 






UM-D 




Earlham 


User Actions 


No. 


% 


No. 


% 


Entered query on different topic 


34 


27.2 


1 


11.1 


Entered same query 


27 


21.6 





0.0 


Quit search 


15 


12.0 


2 


22.2 


Corrected spelling 


11 


8.8 


5 


55.6 


Entered same query minus unposted 
word 


12 


9.6 





0.0 


Entered same query and added new 
word(s) 


11 


8.8 





0.0 


Entered new query with same stem as 


7 


5.6 


(] 


0,0 


previous query 










Entered singular or plural form of 
previous query 


7 


5.6 





0.0 


Entered acronym or spelled it out 


1 


0.8 


1 


11.1 


Total 


125 


100.0 


9 


100.0 
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such users reentered the same queries 
because they wanted to make absolutely 
sure that the system had no titles on these 
topics. Reentering such queries, users 
might have been saying to themselves, 
"There's got to be information on this 
topic in here somewhere." 

Following the system's message in- 
forming them of a possible spelling error, 
fifteen users at UM-D and two users at 
Earlham quit searching. 

A total of sixteen users corrected the 
misspellings in their queries (see table 
9 — words in italics were unposted words 
that the experimental online catalogs sug- 
gested to users were misspelled). 

Of the total 134 queries in which the 
experimental online catalog detected un- 
posted words, 28 queries actually con- 
tained misspelled words. Users corrected 
16 of these queries. Examples of queries 

TABLE 8 

Succeeding Queries 
on Different Topics 



Possibly Misspelled 


Next Queries on 


Queries 


Different Topic 


internet 


Usenet 


microcad graphing 


database 


radiator design 


heat transfer 


zirconia 


chemistry 


general relativity 


computers 


The z8 




microcomputer 


zilog microcomputers 



that users did not correct were "elecctron- 
ics," "circuts," "assemlby language," "co- 
bal languages," and "ei\\EIFFEL." In re- 
sponse to the system's message informing 
the latter users of a possible misspelling, 
some users quit searching, other users en- 
tered diff erent queries, and still other us- 
ers added new words or deleted the pos- 
sibly misspelled words from queries. 

A total of 23 users responded to the 
system message about possible misspell- 
ings by entering the same query minus the 
unposted word or by adding a new word 
to the same query (see table 10 — words in 
italics were unposted words that the ex- 
perimental online catalogs suggested to 
users were misspelled). 

A handf ul of queries was placed in re- 
maining categories. Examples of succeed- 
ing queries that had the same stem as 
preceding queries were "automanual" and 
"auto," "encoding" and "encode," "ar- 
chitectual design" and "architecture." Ex- 
amples of succeeding queries that were 
singular or plural forms of preceding que- 
ries were "florida keys" and "florida key," 
"air bag" and "air bags," and "rotation of 
axis formula" and "rotation of axes for- 
mula." Two queries contained acronyms, 
"IWW" and "International Workers of the 
World" and "cobal languages" and "com- 
mon business oriented language." 

Frequently, the experimental system 
informed end users that their entered 
terms might be misspelled when, in fact, 
their entered terms were not posted in the 
database. 





TABLE 9 




Misspelled Queries 


Possibly Misspelled Queries 


Corrected Queries 


human power vechicles 


human power vehicles 


communcations 


communications 


carsuspension and handling 


car suspension and handling 


chemistryy 


chemistry 


abolishionism 


abolitionism 


c programming lanuage 


c programming language 


monitrins performance 


monitoring performance 


of telephone operators 


of telephone operators 
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TABLE 10 

Queries with Words 
Added or Deleted 



Original Queries 


Added or Deleted Words 
in Subsequent Queries 


rotation of axis 


rotation of axis formula 


probability and 




stdtisics 


u im- 
probability 


irinirrrifip 




equilizer filters 


filters 


graphing 


microcad graphing 


cache memory 


memory 


internet 


internet network 


three-dimentional 




dynamics 


dynamics 



Incorporating Spelling 
Assistance in Online Catalogs 

Users enter subject queries that contain 
spelling errors. This is not an especially 
serious problem with respect to legitimate 
user queries for subjects generally be- 
cause spelling errors occur in a little less 
than 6% of such queries. 

Spelling is also not a serious problem 
with respect to legitimate user queries for 
personal names, especially in catalogs that 
respond to such queries with an alphabeti- 
cal listing because users can browse alpha- 
betical lists to find the desired names. It 
is, however, quite difficult for systems to 
distinguish on their own personal-name 
elements of user queries from topical and 
other types of elements. If users distin- 
guish such elements for systems, systems 
can then use this knowledge to check the 
spelling of topical and other non-name 
elements and, as a last resort, respond to 
users with the results of an alphabetical 
search for the personal-name elements 
only of personal-name subject queries 
when users fail to correct misspelled non- 
name elements. It is difficult for systems 
to detect misspelled personal-name ele- 
ments because of the many variants for 
even seemingly simple names, e.g., Smith, 
Smithe, Smidth, Smitt, Smitz, Smyth, or 
Smythe. The alphabetical approach that 
was the default response in ASTUTE's 



Blue System to personal-name queries 
bearing personal-name elements only will 
help users whose personal-name queries 
are in the same alphabetical neighbor- 
hood as listed personal-name subject 
headings or especially persevering users 
who are willing to browse backward and 
forward for the desired name. 

Despite the infrequency of spelling er- 
rors, such errors can completely derail the 
most routine subject search. Examples 
come from a search in which a user began 
by using the misspelled term "lyprosy" 
followed by 45 other access points that 
either retrieved material that was too 
broad (using queries such as "microbiol- 
ogy," "skin diseases," and "skin growth") 
or that f ailed to retrieve any material due 
to other spelling errors or collection fail- 
ure, e.g., "lepors," "lyprosy" (entered mul- 
tiple times), and "hansen's disease," and a 
search in which a user entered the mis- 
spelled query "mideival art" three times, 
received no guidance from the system as 
to the correct spelling of the misspelled 
query word "mideival," and then walked 
away. In view of these two users' behavior, 
we can speculate that neither user knew 
that the root of the problem was a mis- 
spelled query. 

On one hand, we can continue to allow 
our online catalogs to fail our users in view 
of the infrequency of spelling errors. On 
the other hand, we can also make rather 
simple enhancements to our existing on- 
line catalogs to help users overcome mis- 
spelled queries. Here are three sugges- 
tions. 

First, online catalogs should be 
equipped with search trees to place the 
burden of selecting a subject-searching 
approach in response to user queries on 
the system instead of on users. An empiri- 
cal study of search-tree effectiveness 
demonstrated that the search trees were 
more effective in selecting a subject- 
searching approach that would produce 
useful information for the subjects users 
seek than users would select on their own 
(Drabenstott and Weller 1995). Search 
trees considerably reduce search-ap- 
proach failures. These failures are the di- 
rect result of the failure of a particular 
search approach to retrieve usef ul retriev- 
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als in response to user queries. Search 
trees enlist all search approaches in a de- 
liberate sequence that begins with con- 
trolled vocabulary approaches that are 
more likely than free-text approaches to 
retrieve relevant material. Search trees 
also include tactics that are intended to 
overcome spelling errors. For example, 
search trees check the individual words in 
non-name elements to determine 
whether they produce retrievals. While 
this tactic is intended to conserve system 
resources connected with keyword 
searching, it produces intermediary re- 
sults that are useful to online catalog us- 
ers, because if one or more words in a 
query that is submitted to keyword 
searching fail to produce retrievals, key- 
word and implicit Boolean searching will 
also fail. Thus, it makes sense for the sys- 
tem to report intermediary results to users 
so that they can decide what to do with the 
offending words. Another example is the 
use of alphabetical searching for subject 
queries bearing personal names only or a 
combination of personal-name and non- 
name elements that fails to produce re- 
trievals for both name and non-name ele- 
ments. The alphabetical approach gives 
users the opportunity to browse backward 
and forward in alphabetical lists to find 
the desired names. 

Second, when systems are unable to 
produce retrievals for elements of user 
queries, they should inform users and sug- 
gest one or more correct spellings of the 
possibly misspelled word. In view of the 
popularity of word-processing programs 
that have such spelling-correction rou- 
tines at the present time, online catalog 
users might come to expect such assis- 
tance from online catalogs. 

Third, while computer-assisted spell- 
ing routines in online catalogs can help 
users and systems identify misspelled 
words, they cannot distinguish between 
words that fail to produce retrievals be- 
cause of misspellings or collection failure. 
Online catalog indexes could be enhanced 
with words and phrases from dictionaries, 
subject-heading lists, thesauri, and vari- 
ous other specialized and authoritative 
subject vocabularies. When query words 
match unposted words in these vocabular- 



ies, this would be an indication that the 
failure to produce retrievals was due to 
collection failure and not spelling. Sys- 
tems could even use the knowledge of the 
match to suggest that users search a spe- 
cialized database. For example, suppose 
that words in a user query matched words 
from a specialized dictionary or thesaurus 
in zoology that were not posted in the 
online catalogs database. The system 
could use this knowledge to suggest that 
the user search a specialized zoology da- 
tabase or a general science database that 
provides access to abstracting and index- 
ing records to journal articles. 

SUMMAKY 

The purpose of this paper is to add to our 
understanding and knowledge of spelling 
errors in online catalog searches based on 
empirical studies of spelling errors in on- 
line catalog searches and suggest ways in 
which systems that detect such errors 
should handle the errors that they detect. 

An empirical study of spelling errors in 
online catalog searches involved a catego- 
rization of user queries for subjects that 
were extracted from four university librar- 
ies' online catalog transaction logs. The 
results of the analysis demonstrated that 
less than 6% of user queries that match 
the catalog's controlled and free-text 
terms contain spelling errors. This per- 
centage does not account for spelling er- 
rors in user queries that fail to match the 
catalog's controlled and free-text terms. It 
was difficult for the researchers to quan- 
tify spelling errors in nonmatching que- 
ries because we were unable to verif y cer- 
tain terms and phrases. We concluded 
that a combination of a number of tech- 
niques (e.g., truncation, matches on fewer 
than all words in queries) would probably 
result in matches that would lead to re- 
trievals. As a last resort, searches of jour- 
nal article abstracts, back-of-the-book in- 
dexes, or tables of contents might produce 
retrievals, but few online catalogs index 
terms from these sources in their data- 
bases. 

An empirical study of online catalog 
use tested a new subject-access design. 
This design featured an online catalog that 
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had search trees to govern the systems- 
selection of searching routines in re- 
sponse to user queries. Search trees de- 
termined the extent to which user queries 
matched subject headings and other sub- 
ject-rich terms in bibliographic records. 
This machine-based analysis resulted in 
the selection of a subject-searcliing ap- 
proach that was likely to produce useful 
retrievals in response to user queries. 
Failure to effect a match between queries 
and the catalog's vocabulary sometimes 
meant that query words were misspelled. 
The experimental online catalog reported 
such queries to users and asked diem to 
check their queries for possible spelling 
errors. The results of this interaction be- 
tween system and users demonstrated 
that users responded in several different 
ways to an online catalog that assisted 
them in detecting misspelled queries. 
Some ways resulted in a successful search. 
For example, the system identified a mis- 
spelled word, the user corrected the spell- 
ing, and the system produced useful re- 
trievals ibr the corrected query. Some 
ways resulted in an unsuccessful search. 
For example, the system identified a mis- 
spelled word, the user did not correct the 
spelling, and, instead, added another 
word or phrase to the query in addition to 
the misspelled word. The experimental 
online catalog detected a total of'134 que- 
ries in which words were possibly mis- 
spelled Of these queries, only 28 queries 
contained misspelled words, and users cor- 
rected 16 of these queries. Many of the 
remaining 106 queries were not spelled in- 
correctly. Instead, diey contained words that 
were not in the catalogs database and, thus, 
were queries that failed due to collection 
failure. Searches of more comprehensive 
databases or records that had more depth 
than records in library cataloging data- 
bases might produce useful retrievals for 
these queries. 

We concluded with three recommen- 
dations to improve the responsiveness of 
online catalogs to user queries that maybe 
marred by spelling errors. First, we rec- 
ommended that online catalogs be 
equipped with search trees to place the 
burden of selecting a subject-searching 
approach in response to user queries on 



the system instead of on users and, thus, 
reduce search-approach failures in sub- 
ject searching. Search trees also utilize 
tactics that are intended to overcome 
spelling errors such as the alphabetical 
approach, which gives users the opportu- 
nity to browse backward and forward in 
alphabetical lists to find the desired 
names. Second, we recommended that 
systems be equipped with automatic spell- 
ing-detection routines that, at the very 
least, inform users of a possibly misspelled 
word or words. Third, we recommended 
that online catalogs be enhanced with 
tools and techniques to distinguish be- 
tween queries that fail due to misspellings 
and collection failure. 

In closing, we caution that spelling is 
not a serious problem in suhject retrieval, 
but, unfortunately, a problem as simple as- 
spelling can completely derail the most 
routine subject search. Users expect and 

rection routines in off-the-shelf word- 
processing software. Isn't it time to pro- 
vide them with spelling correction in 
online catalog searching? 
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Tables of Contents in Library 
Catalogs: A Quantitative 
Examination of Analytic 
Catalogs 

Claus Poulsen 



Easy access to tables of contents from vendors and the technological devel- 
opment of optical character reading have actualized access to articles in 
books via tables of contents in library catalogs, From earlier studies we know 
that analytic hook catalogs can provide access to up to 600% more works- 
than the traditional catalog by simply adding analytics for works in compos- 
ite works to the catalog. In this study we examine the proportion of composite 
works and the number of articles in these books in two different university 
libraries. The influences of library type, publication language, subject field, 
and. date of publication are examined, and the results are compared to 
previous studies. The proportion of composite works is between 10% and 
20%. The number of articles in the composite works varies from 20 to 30 
articles per book — highest for the sciences and the English-language publi- 
cations and lotvestfor the social sciences. 



./^Lccess to articles from composite 
works has always been valued by users. 
Whereas journal articles are well indexed 
through national and international ser- 
vices, it is not easy for the user to get 
access to articles in books. Some indexes 
make analysis of some composite works. 
Some vendors offer access to tables of 
contents Irom many books. But since the 
publication patterns for books are geo- 
graphically, institutionally, and culturally 
much more heterogenous than for jour- 
nals, the value of these indexes is limited. 
A natural improvement of library ser- 



vices, therefore, would be access to arti- 
cles in books via the online catalog. The 
possibility of retrieving electronic tables 
of contents from several vendors and the 
development of optical character reading 
technologies make it realistic to imple- 
ment such a service. 

To grasp the potential of such an im- 
provement we will study the number of 
articles from composite works that would 
be accessible in such an analytic catalog. 
In other words, how many extra citable 
references for works are added to the on- 
line catalog via articles in books? 



CLAUS Poulsen is Research Librarian, Rosldlde University Library, Denmark (e-mail: 
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the subject. Manuscript received June 16, 1995; revised December 20, 1995; accepted for 
publication January 13, 1996. 
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Enhanced Catalog Records 

Traditional library catalogs do not offer 
access to articles in books. The biblio- 
graphic records for composite works do 
not contain access points for the authors 
and titles from the articles that make up 
the book. In an analytical catalog the re- 
cords are so enhanced, which provides 
direct access to author and title citations 
for articles in books. 

For years studies have indicated the 
desirability of improving access to biblio- 
graphic records in library catalogs through 
the provision of contents notes, tables of 
contents, abstracts, and other methods 
(Van Orden 1990). All studies show an 
increased recall in subject searching when 
contents notes are added. 

The traditional question is whether di- 
minished precision from the addition of 
contents information can invalidate the 
increased recall. Some studies show that 
precision does not suffer from this addi- 
tion of contents information (Atherton et 
al. 1978; Cochrane 1985; Byme and 
Micco 1988; Beatty 1991). Other studies 
show decreasing precision (Dillon and 
Wenzel 1990; Knutson 1991). For the 
time being, apparently, no unambiguous 
answers exist to the question put in such 
a general way. 

Another substantial question is 
whether retrieving a large number of con- 
tents notes would cause information over- 
load. By increasing the natural text vol- 
ume dramatically, the number of records 
retrieved in a topical search will increase 
(Lancaster, Elliker, and Connell 1989). 
This was demonstrated by Byme and 
Micco in a pilot study and later reaffirmed 
by Beatty in a full-scale study (Byrne and 
Micco 1988; Beatty 1991). By adding con- 
tent-bearing words from tables of con- 
tents to the catalog, they reported a 300% 
increase in the number of items found. 
One consequence could be that improved 
analytic access might render the most 
popular strategy to overcome information 
overload no longer useful — topical 
searching using title words in books (Lar- 
son 1991). 

Apart from the possibility of informa- 
tion overload, this problem will become 



increasingly important as the volume of 
text in library catalogs increases. Conse- 
quently, we have to develop a strategy for 
coping with online catalogs containing in- 
creasing numbers of records, which in 
turn have been enhanced with an increas- 
ing amount of description of the contents 
of the books. One proposed strategy is to 
separate the enhanced descriptions and 
offer the choice between traditional sub- 
ject searching in classification, keywords, 
and titles or searching where the detailed 
subject descriptions from the enhanced 
records are added (Beatty 1991). An- 
other — more radical — proposal has been 
put forward independently by Lancaster 
et al. (1991) and Poulsen (1990). In both 
papers, the authors suggest improving 
subject access in library catalogs by an 
enhanced subject description combined 
with a reduction of the number of records 
meant for this enhancement. The selected 
records for this subcatalog represent the 
encyclopedic or the survey literature. 

Quantitative Results from 
Previous Studies of Analytic 
Book Catalogs 

Analytic book catalogs are catalogs en- 
hanced by authors and titles from articles 
in composite works. Composite works are 
books with two or more distinct works by 
the same or different authors. Conse- 
quently, a composite work consists of two 
or more citable works. Edited works, an- 
thologies, and conference proceedings 
are examples. 

To show the potential for enhancing 
the online catalog in this way requires an 
examination of the number of composite 
works in the library and the number of 
articles or citable works in these books. 

Hoffman and Magner (1985), working 
with a sample of 4,094 books in the Santa 
Ana College Library, found that 21.3% 
were composite works, containing an av- 
erage of 31.2 articles or citable works 
each. In other words, an analytical catalog 
could add access to 600% more citable 
works than the traditional library catalog! 

Their result is supported by a sample 
of 446 items in the 1982 cumulation of the 
American Book Publishing Record show- 
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ing 22.2% multiple-work documents or 
composite works (Hoffman and Magner 
1985, 152). 

Although many authors have studied 
catalogs with enhanced contents informa- 
tion since the Hoffman and Magner study, 
they rarely specified the number of com- 
posite works and citable works in the cata- 
log (Byrne and Micco 1988; Beatty 1991; 
Michalak 1990). Only Weintraub and 
Shimoguchi (1993) did so in their study of 
catalog record enhancement. They re- 
ported that 44 out of a sample of 375 
books from the San Diego University Li- 
brary were composite works. The selec- 
tion was made from a broad subject field, 
dominated by language and literature and 
medicine (see table 1). Further, they re- 
ported 1,978 citable references in these 
44 books. A recalculation of their statistics 
shows that 12% ±3% of the books are 
composite works with an average of 4,522 
articles in these composite works. The 
confidence intervals are at the 95% inter- 
val. If we calculate the 95% confidence 
interval in the Hof fman and Magner study 
we get 21.3% ± 1.3% composite works to 



be compared to 12% ±3% composite 
works in the Weintraub and Shimoguchi 
study. 

The difference between these results 
is significant. It might be attributed to the 
different subject fields of the examined 
books. Weintraub and Shimoguchi s study 
indicates a difference in the number of 
articles in composite works between dif- 
ferent subject fields (Weintraub and 
Shimoguchi 1993, 176). The difference 
can also be due to differences in library 
type, acquisition policy, and the age of the 
collection. And finally the two studies 
show methodological differences. 

The explicitly described criteria for se- 
lection of composite works show a differ- 
ence. Weintraub and Shimoguchi use the 
information in the table of contents exclu- 
sively, whereas Hoffman and Magner 
could increase the number of "multiwork 
documents" by looking inside the book 
(Weintraub and Shimoguchi 1993, 170, 
172-73; Hoffman and Magner 1985, 152). 

These results raise new questions. Can 
we predict the number of composite 
works and of articles in these books? Are 



TABLE 1 

Composite Works in Library Catalogs 



Library 


Selection 


No. Books 
Examined 


Composite 
Works (%) 


Mean No. Articles in 
Composite Works 


Santa Ana College" 


All books 


4,098 


21.3 ±1.3' 


31 


San Diego State 
Univ. Lib.t 


P-PT, Q-QR, RA-RC, 
T-TKin LC class.' 


375 


12 ± 311 


45 + 22* 


Nat. Lib. Educ, 


PY<1990 


718 


16 ±3 


20 ae4 


Nat. Lib. Educ. 


PYi.1990 


210 


17 ±5 


17*5 


Nat. Lib. Educ. 


English language and 
PY= 1980: 1990 


496 


15 + 3 


29 a; 9 


Roskilde Univ. 


Social sciences and 
PY= 1988: 1992 


887 


24±3 


17 a; 3 


Roskilde Univ. 


Science and 
PY=1988:1992 


698 


18 ±3 


28 33 5 



All ± intervals are 95% confidence intervals. 

Hoffmann and Magner 19S5 
t See appendix. 

t Weintraub and Shimoguchi 1993. 

$ The contributions are dominated by language and literature (60%) and medicine (24%) with no or 

negligible contributions from social sciences, science, and technology. 
" See appendix. 
# See appendix. 
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there real differences between library col- 
lections? Are there differences between 
collections in broad universal libraries 
versus specialized libraries? Do collec- 
tions vary by subject fields, publication 
languages, and the age of the library hold- 
ings? Depending on the answers, are 
there 10% or 20% composite works in a 
given library and do these books contain 
20 or 50 articles each? Will the proportion 
of citable references in your catalog in- 
crease by 200% or 1,000% if you include 
access to articles in books? The answers 
are critical for planning. 

The Present Study 

Using samples, we try to illustrate the 
degree to which the proportion of com- 
posite works is dependent on library hold- 
ings in terms of broad universal holdings 
versus specialized holdings, subject field, 
publication language, and the age of the 
holdings. 

Study Libraries 

We limited our study to university librar- 
ies, which may have general or specialized 
collections, categories that are not mutu- 
ally exclusive. Would there be a diff erence 
in the volume of analytic catalogs in the 
two types of libraries? Roskilde University 
Library in Denmark, covering sciences, 
technology, social sciences, and the hu- 
manities, is not a specialized library but a 
general library. Roskilde University Li- 
brary was founded in 1972 and has ap- 
proximately 450,000 volumes. On the 
other side we have an extremely special- 
ized library, the Special Collection at the 
National Library of Education in Den- 
mark, covering exclusively education and 
psychology. This collection was founded 
in 1887 and has approximately 350,000 
volumes that are split into two collections: 
those published before and after 1990. 

Sampling Technique 

Two sampling techniques (one for each 
library) were required to achieve random 
samples. In the National Library of Edu- 
cation we selected using physical place- 



ment on the shelves ("book number four 
from the left on each shelf). Only the 
actual volume selected was examined, 
even if it was part of a multivolume work. 

To sample from the three subject li- 
braries at Roskilde University Library, we 
used machine-generated numbers of bib- 
liographic records ending at random digits 
(e.g., records ending with the digits "82"). 
In this case, if records represented a mul- 
tivolume work all volumes were in- 
spected. 

The two sampling techniques yield the 
same total number of articles or citable 
works in the catalog. The numbers of com- 
posite works and of articles within com- 
posite works vary depending on whether 
we count volumes or records; however, as 
the multivolume records constitute only 
1% to 2% of the records in the analyzed 
library catalogs, we make no distinction 
here between the results from the two 
sampling methods because the resulting 
errors are smaller than the 95% confi- 
dence interval (see table 1). 

It is assumed that none of these selec- 
tion rules correlate with the parameters 
selected for this investigation: the propor- 
tion of composite works and the number 
of articles in these books. Books not found 
on the shelves were reserved for the study 
for four months. The period of loan was 
one month. If they were not received after 
four months, we excluded them. 

To count the composite works, we 
counted only books containing tables of 
contents with at least two separately titled 
works by the same or different authors. To 
count the works, we counted separately 
titled and authored works in the tables of 
contents. No restrictions were made on 
the number or the length of works 
counted. Introductions, prefaces, and 
other generic articles were excluded. 

Date of Publication 

If either publication or acquisition praxis 
have changed over time, this could influ- 
ence the proportion of composite works in 
the library and the average number of 
articles in these books. Therefore the 
books from the National Library of Edu- 
cation were divided into two categories: 
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books published before and af ter 1990, to 
learn whether there is any diff erence be- 
tween the proportions of composite works 
and articles. 

Language 

Because the publication language for in- 
ternational congresses is predominantly 
English, books in English were examined 
separately in the National Library of Edu- 
cation to see whether the number of com- 
posite works or the number of articles has 
been influenced by the publication lan- 
guage. 

Subject Field 

Previous studies indicate significant vari- 
ations between different subject fields 
with respect to the volume of composite 
works and the number of articles in these 
composite works (Weintraub and Shi- 
moguchi 1993, 176). But their data analy- 
sis indicates also that the sample for inves- 
tigation, though selected randomly, might 
not be typical for the total collection. They 
consequently ask for further studies of the 
influence of subject field. We compared 
science, social sciences, and the humani- 
ties with respect to the volume of compos- 
ite works and the number of articles in the 
composite works. To do this we looked at 
a comparison of the holdings of science 
and social sciences books at Roskilde Uni- 
versity Library and the essentially human- 
istic holdings of books at the National 
Library of Education. 

Results 

The results are presented in table 1, with 
the results from the Santa Ana College 
Library study (Hoffman and Magner 
1985) added in the first row and the re- 
sults from the San Diego State University 
Library study (Weintraub and Shi- 
moguchi 1993) added in the second row. 
The statistical deviation measures are the 
conventional 95% confidence interval. 

The studies of the two Danish univer- 
sity libraries with respect to date of publi- 
cation, publication language, specialized 
versus general libraries, and subject field 



indicate no or very weak dependence on 
these parameters. Only the difference be- 
tween the mean number of articles in 
composite works in the sciences and the 
social sciences at Roskilde University Li- 
brary proves to be significant within the 
95% confidence interval. 

Discussion and Conclusion 

The results from the two American 
studies and the present study are expected 
to reflect not only the influences under 
consideration from subject fields, publica- 
tion language, publication date, and spe- 
cialized versus general libraries, but also 
the influence of acquisition policies at the 
different libraries. In light of the immense 
differences between the four libraries, the 
relatively constant character of the results 
is striking. The proportion of composite 
works is between 10% and 20% . The num- 
ber of articles in the composite works 
varies from 20 to 30 articles per book — 
highest for the sciences and the English- 
language publications and lowest for the 
social sciences. This implies that the li- 
braries under consideration can add ac- 
cess to between 200% and 600% more 
works to their catalog without buying one 
book more, just by adding the tables of 
contents of their composite works. This is 
a challenge and an immense increase in 
the number of access points in the catalog. 

But the improved description of the 
composite works — because of the large 
amount of text in the individual biblio- 
graphic records — may introduce low pre- 
cision and information overload if ana- 
lytics are simply added to the catalog. 
Therefore, most libraries could profit by 
handling this improvement in separate 
files to be accessed by the choice of the 
user. 



Works Cited 

Atherton, P., and others. 1978. Books are for 
use. Final report of the Subject Access Pro- 
ject to the Council on Library Resources. 
Subject Access Project, School of Informa- 
tion Studies, Syracuse University, New 
York Report IST-10. 



138/ LRTS • 40(2) • Poulsen 




...working 
smart 

Information Industry 
leaders in providing 
subscription services, 
article delivery & library 
automation software. 

• REMO 

• ROSS 

• Renewal Express 

• Financial Planner 

• UnCover 

• BACKSERV 



Readmore Academic Services 

700 Black Horse Pike, Ste. 207 
Blackwood, NJ 08012 
Phone: 1-800-645-6595 
Fax: 609-227-8322 



Beatty, S. 1991. ESP at ADFA after five years. 
Cataloguing Australia 17, no. 3/4: 65-92. 

Byrne, A., and M. Micco. 1988. Improving OPAC 
subject access: The ADFA experiment. Col- 
lege i? research libraries 49: 432-^11. 

Cochrane, P. Atherton. 1985. Redesign of cata- 
logs and indexes for improved online sub- 
ject access. Selected papers of Pauline A. 
Cochrane. Phoenix: Oryx. 

Dillon, M., and P. Wenzel. 1990. Retrieval 
effectiveness of enhanced bibliographic 
records. Library hi tech 31, no. 3: 43-46. 

Hoffman, H. H., and J. L. Magner. 1985. Fu- 
ture outlook: Better retrieval through ana- 
lytic catalogs. The journal of academic li- 
brarianship 11: 151-53. 

Knutson, G. 1991. Subject enhancement: Re- 
port on an experiment. College ir research 
libraries 52: 65-79. 

Lancaster, F. W., and others. 1991. Identifying 
barriers to effective subject access in li- 
brary catalogs. Library resources ir techni- 
cal services 34: 377-91. 

Lancaster, F. W , C. Elliker, andT. H. Connell. 
1989. Subject analysis. Annual review of 
information science and technology 24: 
35-84. 

Larson, R. R. 1991. The decline of subject 
searching: Long-term trends and patterns 
of index use in an online catalog. Journal 
of the American Society for Information 
Science 42: 197-215. 

Michalak, T. J. 1990. An experiment in en- 
hancing catalog records at Carnegie Mel- 
lon University. Library hi tech 8, no. 3: 
33-41. 

Poulsen, C. 1990. Subject access to new sub- 
jects, specific paradigms and surveys: 
PARADOKS-registration. LIBRI 40: 179- 
202. 

Van Orden, R. 1990. Content-enriched access 
to electronic information: Summaries of 
selected research. Library hi tech 8, no. 3: 
27-32. 

Weintraub, T. S., and W. Shimoguchi. 1993. 
Catalog record contents enhancement. Li- 
brary resources ir technical services 37: 
167-80. 



/139 



Reshelving Study of Review 
Literature in the Physical 
Sciences 

Nancy J. Butkovich 

Review publications contain articles that give overviews or state-of-the-art 
reports on specific topics. Although some review titles are published more 
frequently, many appear only once a year. At the Physical Sciences L ib ran/ 
of the Pennsylvania State University's University Park Campus, a year-long 
reshelving study of the review publications collection was undertaken to 
determine usage of the titles. This need was fueled by a lack of shelf sjiace, 
storage considerations, and the threat of serials cancellations. Three hundred 
review tides were examined. The best data were found in classes QC, QD, 
and Q1I-QP (monographic series only). The other classes had few titles or 
low use tallies. Approximately half of all titles were used at least once. 
Periodicals had a higher percentage of use than did managraphw series. 



S cientific review publications provide a 
medium for substantive articles by giving 
overviews or state-of-the-art reports on 
specific topics. Many of these are annuals; 
others are review journals that are pub- 
lished more frequentiy. A study hy 
Woodward and Hens man (1976) indicates 
that most of the top thirty scientific pub- 
lications listed in Science Citation Index's 
"Journal Citation Reports," when ranked 
by impact factor, are review publications. 
An examination of the top thirty titles in- 
cluded in the 1992 "Journal Citation Re- 
ports — Journals Ranked by Impact Fac- 
tor" (ISI 1993) indicates that the contents 
ol over half the titles are primarily review 
articles. 

In the Physical Sciences Library of 
Pennsylvania State University's University 



Park campus, these titles had been segre- 
gated in a separate, noncirculating refer- 
ence collection. At the time of this study 
the physical sciences branch library had 
approximately 88,000 volumes and 850 
current serial subscriptions. There were 
eight separate reference collections, in- 
cluding the one in this study, as well as 
separate monograph and journal collec- 
tions. The major subject areas were as- 
tronomy, physics, and chemistry. Secon- 
dary subject areas included biochemistry, 
pharmaceutical chemistry, chemical engi- 
neering, and medicinal chemistry. 

A number of factors provided the im- 
petus for this survey. First, the collection 
was about to fill the designated shelf 
space, and no additional space was avail- 
able. The most obvious solution to this 
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space problem was to incorporate these 
publications into other parts of the collec- 
tion. Such a shift would also simplify ac- 
cess to the collection by reducing the 
number of reference collections a person 
would have to check in order to find a 
desired title. 

The monograph and the journals col- 
lections were also reaching their total ca- 
pacities, so plans were being made for 
moving a portion of the collections to re- 
mote storage facilities. Use data, particu- 
larly concerning the age of the material 
used, could help justify moving certain 
titles either in part or in total to a remote 
storage facility, a situation similar to that 
described by Naylor (1993, 28) and Rice 
(1979, 35, 36). 

Because the collection was noncircu- 
lating, no use data was available for it, and 
once the collection was merged into the 
journal and monograph collections, the 
titles would become anonymous. There- 
fore, another use for the data involved the 
need to have a list of low-use titles avail- 
able in the event that a serials cancellation 
project was necessary. This possible use 
for the data is not unique to this study. 
Naylor (1993, 28) and Swigger and Wilkes 
(1991, 41, 42) reported similar reasons for 
conducting use studies of serial titles. 
Bustion and Treadwell used reshelving 
data to evaluate the reliability of faculty 
use surveys that had been used as the basis 
for a serials cancellation project (1990, 
142-43). 

The data from this project would be 
combined with results from faculty sur- 
veys, SciSearch rankings of publications 
cited by Penn State faculty, and other 
sources of information to identify titles for 
future serial cuts. Although each method 
has weaknesses that are well documented 
in the literature (Rice 1979, 36-37; 
Swigger and Wilkes 1991, 42-44), the 
combination of the data from these differ- 
ent methods would provide a useful meas- 
ure of the use of these titles. 

Reshelving statistics are often used as 
a measure of collection use (Rice 1979; 
Swigger and Wilkes 1991; Naylor 1990, 
1993, and 1994). The method chosen for 
this study is similar to the "sweep" method 
described by Naylor, who found that this 



method produced higher use values than 
did a second method that required pa- 
trons to mark their usage on labels at- 
tached to the journal covers (1993, 30, 62; 
1994, 373-74, 378). 

Methodology 

Data were collected in the following man- 
ner: 

• A book cart was placed next to the 
Reference Review collection and was 
labeled with a sign asking patrons not 
to reshelve review titles. 

• One staff member was given the re- 
sponsibility of marking and reshelving 
these titles. Other staff and student 
employees were instructed not to 
reshelve this material unless they had 
instructions for marking the material. 
Instead, they were to leave items on 
the cart for the staff member assigned 
to the project to mark and reshelve. 

• Each item was marked on the inside 
front cover prior to being reshelved. 
This location was chosen over marking 
the spine or outside cover in order to 
reduce the chance of data being lost. 

• The project was conducted for a pe- 
riod of one year, from October 1991 to 
October 1992. All titles in the collec- 
tion were included, regardless of 
whether or not they were new or cur- 
rent subscriptions or ceased or can- 
celed titles. 

• At the end of the project, the data 
were tabulated. 

Although Ross reported success in an 
unobtrusive study of patron browsing be- 
havior (1983, 269-76), no effort was made 
to account for materials reshelved by pa- 
trons. As Naylor noted, "One assumes that 
this type of behavior is proportional for all 
journals" (1990, 9). Given the staffing 
limitations and the fact that faculty have 
keys that give them twenty-four-hour ac- 
cess to the Physical Sciences Library, any 
attempt to obtain this information would 
be impractical and prohibitively costly. 

Results 

A total of 300 titles was included in this 
project. These titles were broken down 
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into periodicals and series. According to 
the ALA Glossary of Library and Infor- 
mation Science (Young 1983, 166), "peri- 
odical" is defined as: 

A serial appearing or intended to appear at 
regular or stated intervals, generally more 
frequendy than annually, each issue of 
which is numbered or dated consecutively 
and normally contains separate articles, 
stories, or other writings. 
The same source (Young 1983, 204) 
defines "series" as: 

A group of separate bibliographic items 
related to one another by the fact that each 
item bears, in addition to its own title 
proper, a collective title applying to the 
group as a whole. 

Of the 300 titles, 250 were series, most 
of which were either monographic series 
or annual reviews, and 50 were peri- 
odicals. A few titles were sets with one 
publication date. These were included 
with the monographic series. Approxi- 
mately 50% (151 titles) were used at least 
once during the study period. Of the 250 
monographic series, 110 (44%) were 
used, and of the 50 periodicals, 41 (82%) 
were used (see table 1 for a more detailed 
breakdown of usage, sorted by Library of 
Congress classification). 

The greatest usage of the collection 
based on the number of titles held was in 
the QC and QD ranges (physics and 
chemistry). However, over 50% of titles in 
the QBs (astronomy) and TA-TP (engi- 



neering and technology) were used during 
the survey period. 

An examination of the individual uses 
of each title (see table 2) also shows a high 
rate of use in the QH-QP area (life sci- 
ences) in addition to the QCs and QDs. 
High average use per title was found 
among monographic series in the QBs (as- 
tronomy) and review periodicals in the 
TA-TP (engineering). 

These data become even more impres- 
sive when the number of titles used in- 
stead of the number of titles held are 
compared with the number of uses. For 
instance, the QBs would have an average 
of 11 uses per title, instead of the average 
of 7 uses per title if all titles held were 
counted. Titles that are broadly categorized 
as medicine (RA-RS) showed relatively light 
use. The one forensic science title (HV) is 
omitted from this table, because it was not 
used during the study period. 

Because the possibility exists of using 
these data for remote storage considera- 
tions, knowing the age of the materials 
being used becomes important. The usage 
of monographic series and review peri- 
odicals sorted by LC classification and by 
date of publication appear in tables 3 and 
4. When more than one year was included 
in a physical volume, the date used was the 
oldest included in that volume. LC classes 
that had fewer than 50 uses per document 
type were not included, since the data sets 
were too small to give meaningful results. 



TABLE 1 

Number of Titles and Number of Titles Used 
(Arranged by LC Class) 



Totals 



Monographic Series 



Review Periodicals 



LC Class 


No. Titles 


No Titles 


No. Used 


No. Titles 


No. Used 


HV 


1 


1 











QB 


5 


3 


2 


2 


1 


QC 


51 


39 


16 


12 


11 


QD 


183 


154 


71 


29 


23 


QH-QP 


24 


24 


9 








RA-RS 


14 


11 


4 


3 


2 


TA-TP 


22 


18 


8 


4 


4 
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TABLE 2 

Total Number of Titles Held and Individual Uses of Those Titles 
(Arranged by LC Class) 

Monographic Series Review Periodicals 



LC Class No Titles Held No Uses No Titles Held No Uses 

QB 3 22 2 1 

QC 39 70 12 65 

QD 154 497 29 419 

QH-QP 24 195 

RA-RS 11 12 3 3 

TA-TP 18 30 4 23 



TABLE 3 

Usage of Monographic Series 
(Sorted by LC Classification and by Date of Publication) 



Dnte 




QC 


LC Classification 

QD 


QH-QP 


1988-1992 


34.3 


(24) 


26.6 


(132) 


28.7 


(56) 


1983-1987 


8.6 


(6) 


21.7 


(108) 


30.8 


(60) 


1978-1982 


25.7 


(18) 


17.7 


(88) 


25.6 


(50) 


1973-1977 


11.4 


(8) 


13.1 


(65) 


6.7 


(13) 


1968-1972 


8.6 


(6) 


9.1 


(45) 


5.6 


(ID 


1963-1967 


8.6 


(6) 


6.6 


(33) 


2.1 


(4) 


1958-1962 


1.4 


(1) 


2,8 


(14) 


0.0 




1953-1957 


1.4 


(1) 


1.8 


(9) 


0.5 


(1) 


1948-1952 


0.0 




0.4 


(2) 


0.0 




1943-1947 


0.0 




0.0 




0.0 




1938-1942 


0.0 




0.2 


(1) 


0.0 





"The total number (n) of uses is listed in parentheses All columns total 100% 



Limitations of the Methodology 

Although surveys like this are enticing be- 
cause of their simplicity, they do have 
weaknesses that limit the usefulness of the 
data obtained. Some of the major prob- 
lems are listed below. 
• The way in which the item was used is 

unknown (Rice 1979, 36; Swigger and 

Wilkes 1991, 42). 



• The patron may have reshelved the 
material rather than leaving it for the 
library staff to reshelve (Rice 1979, 36; 
Swigger and Wilkes 1991, 42; Naylor 
1993, 28). 

• A library employee who is not in- 
volved in the project could reshelve 
materials without marking them. 

In spite of these weaknesses the data 
obtained from this sort of project can be 
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TABLE 4 

Percentage of Review Periodicals 
ed ry LC Classification and by Date of Publication) 



LC Classification 

Date QC QD 



1988-1992 


58.5 


(38) 


40.8 


(171) 


1983-1987 


21.5 


(14) 


19.8 


(83) 


1978-1982 


9.2 


(6) 


12.9 


(54) 


1973-1977 


1.5 


(1) 


8.6 


(36) 


1968-1972 


4.6 


(3) 


9.3 


(39) 


1963-1967 


1.5 


(1) 


2.9 


(12) 


1958-1962 


0.0 




1.9 


(8) 


1953-1957 


1.5 


(1) 


1.2 


(5) 


1948-1952 


1.5 


(1) 


2.2 


(9) 


1943-1947 






0.0 




1938-1942 






0.2 


(1) 


1933-1937 






0.0 




1928-1932 






02 


(1) 



•The number of uses (n) is given in parentheses The QD percent column totals 100%; the QC column totals 
99 8% due to round-off error. 



useful, particularly when used in conjunc- 
tion with data from other sources. How- 
ever, one should keep in mind that the 
values obtained are minimums and not 
absolutes and that the results from classi- 
fication areas represented by few titles are 
going to be poor. 

Conclusions 

The best data were found in classes QC, 
QD, and QH-QP (monographic series 
only). The other classes had few titles or 
low use tallies, and so the data were less 
reliable. It should also be noted that these 
data represent large user populations in 
this particular library, and a library serving 
a different mix of clientele would show 
different patterns of use. 

With these sources of error in mind, 
the data can be summarized as follows: 

• Approximately half of all titles were 
used at least once. 

• Periodicals had a higher percentage of 
use than did monographic series, pos- 
sibly because there were more issues. 



• Approximately 80% of the physics 
monographic series and 79% of the 
chemistry series volumes used were 
published since 1973. The area 
broadly categorized as life sciences 
reached 85% by 1978. 

• For periodicals, physics titles reached 
the 80% mark by 1983, and chemistry 
reached 82% of its usage in items pub- 
lished since 1973. 

Partly as a result of this project, peri- 
odical titles were incorporated into the 
journal collection, while series were 
merged into the monograph collection. 
These series can now circulate. Keeping 
in mind the limitations of the data, these 
results will be used in conjunction with 
data from other sources to identify poten- 
tial items for cancellation lists or for trans- 
fer to remote storage facilities. 
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The Structure of the Library 
Market for Scientific Journals: 
The Case of Chemistry 

Stephen J. Bensman 

In this paper, the author analyzes the skewed distributions of price and scientific 
value that constitute the structure of the library market for scientific journals, 
using chemistry as a test case. A numerical index constructed from a survey of 
Louisiana State University chemistry faculty and total citations taken from the 
Science Citation Index Journal Citation Reports were utilized as measures of 
scientific value. Methodological problems arise from the skewed distributions 
customary in library research. The major findings are (1 ) that scientific value 
does not play a role in the pricing of scientific journals and (2) that little 
relationship consequently exists between scientific value and the prices charged 
libraries for scientific journals. Libraries have the opportunity to implement a 
massive restructuring of their serials collections. A software package named the 
Serials Evaluator is described. Under development at Louisiana State Univer- 
sity, it is software for the automated selection of journals for cancellation and 
remote access through document delivery. 



The Problem 

Librarians live in a world of highly skewed 
statistical distributions. Virtually every- 
thing they see or touch in their work is 
affected by such distributions. These dis- 
tributions lead to the fact that a small 
minority of agents account for the vast 
majority of events. Some good rules of 
thumb are that 10% of the subjects will be 
responsible for some 40% to 50% of the 
observed objects, and 20% will cause 
about 60% to 80%. This phenomenon has 
been documented in — among others — 
the following areas: authorship of articles 
and books; the distribution of articles on 



a given subject over journals; citations to 
persons, articles, journals, and academic 
departments; and the circulation of library 
materials. Of particular interest in these 
distributions is what can be termed the 
zero or random class, which can constitute 
up to 40% of a given universe of possible 
active agents. Examples of this class are 
potential authors who never or rarely pub- 
lish, articles that are never or rarely cited, 
and library materials that never or rarely 
circulate. Given their nature, these 
skewed distributions appear to be stable 
over time (Bensman 1982; Bensman 
1985). 

Highly skewed distributions are not 
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limited to library science but are also 
found in such diverse areas as biology, 
economics, geography, and linguistics. 
Therefore it is not surprising to find that 
journal prices are also extremely skewed. 
This distributional characteristic of jour- 
nal prices is obvious when journal prices 
are ranked in descending order by Library 
of Congress Classification (LCC) subject 
class. Analysis of the data presented in the 
1995 "U.S. Periodical Price Index" (Alex- 
ander and Carpenter 1995) shows that the 
average price of a U.S. periodical per LCC 
subject class ranges from $628.89 in Q 
(Science) to $28.18 in A (General Works) 
and that — after the average prices are 
summed — the top two classes Q (Science) 
and T (Technology), or 10.5% of the nine- 
teen classes, account for about 41% of the 
summed average prices. When the aver- 
age prices are aggregated into total prices 
by multiplying them by the number of 
periodicals in each class, the skew be- 
comes even more pronounced, and class 
Q (Science) alone represents 48.8% of the 
total cost of the sample. 

The highly skewed nature of journal 
prices received quite a bit of publicity in 
the late 1980s when it was revealed in a 
number of studies conducted at academic 
libraries of their serials expenditures. The 
authors of these studies found a typical 
pattern whereby 10% of the titles were 
responsible for 50% of the serials budget, 
and this pattern was verified at institutions 
as diverse as Kent State, the University of 
Hawaii, Clemson, the University of 
Michigan, and Louisiana State University 
(LSU). At the latter institution titles cost- 
ing $80 or more constituted only 20% of 
the subscriptions but 72% of the serials 
expenditures. However, more disturbing 
was the fact that serials costs were also 
highly skewed when analyzed in terms of 
publishers. The LSU study revealed that 
the top fifty publishers whose titles cost 
the university $2,000 and more accounted 
for only 10% of the serials titles but almost 
50% of the serials budget. Of these fifty 
publishers the top four — Elsevier, 
Springer, Pergamon, and Plenum — re- 
ceived some 23% of the money LSU spent 
on serials (Hamaker 1988; Hamaker 
1987). Similar results were obtained in 



studies at the University of Michigan 
(Dougherty and Johnson 1988). 

The seemingly disproportionate share 
of serials budgets being soaked up by a 
few publishers provoked outrage within 
the library community. Matters were not 
helped by the fact that many of these 
publishers were foreign and responsible 
for a large share of the inflationary price 
increases ravaging library materials bud- 
gets. Hamaker (1988, 211) berated the 
foreign publishers for selling American 
research back to American libraries at 
premium prices and accused them of "in- 
formation colonialism." Meanwhile, 
Dougherty and Johnson (1988) insinuated 
that publisher profit was the driving force 
behind serials prices, hinting openly that 
"the small group of publishers who domi- 
nate commercial publishing have created 
an oligopoly." The outrage culminated in 
two research reports and a series of reso- 
lutions sponsored by the ARL (Associa- 
tion of Research Libraries 1989). In the 
first report, Economic Consulting Ser- 
vices, Inc., analyzed the pricing practices 
of four commercial publishers — Elsevier, 
Pergamon, Springer, and Plenum — and 
concluded that this group had increased 
subscription prices at a much faster rate 
than the rate at which their costs had 
increased. It recommended that the li- 
brary community encourage new entrants 
into serials publishing and stimulate 
greater competition among publishers. In 
the second report Okerson outlined the 
burgeoning serials crisis as resulting from 
five basic causes: (1) the explosion in the 
number of serials titles; (2) the increasing 
size and frequency of many serials titles; 
(3) the concentration of these increases in 
the most expensive fields, particularly the 
sciences; (4) the key role of commercial, 
profit-seeking, international publishers in 
the production of serials, particularly in 
the scientific fields; and (5) the movement 
of monetary exchange rates and the use by 
publishers of differential regional prices 
to the detriment of North American li- 
braries. Okerson then recommended that 
the ARL should advocate: (1) the transfer 
of the publication of research results from 
serials produced by commercial publish- 
ers to existing noncommercial channels, 
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specifically encouraging the creation of 
innovative nonprofit alternatives to the 
traditional commercial publishers and (2) 
policy changes by university administra- 
tions and granting agencies for promo- 
tion, tenure, and funding, so as to mini- 
mize pressure for excessive publication. 
The conclusions and recommendations of 
the two reports were basically adopted in 
a series of resolutions by the ARL mem- 
bership, although the one on excessive 
publication appears to have been toned 
down to that the "ARL form a partnership 
with scholarly groups to examine the 
scholarly publishing process and to find 
ways to manage the explosion in research 
and knowledge and the concomitant ex- 
plosion in publishing." All in all, it was a 
breathtaking stand against natural and so- 
cioeconomic forces, some of the latter of 
which had been developing for centuries. 

However, the problem might not lie in 
the skewed distribution of journal prices. 
Skewed distributions are so common in 
nature and society that in a certain sense 
being angry at one is almost akin to being 
upset that the stars distribute themselves 
unevenly across the universe in galaxies. 
Moreover, skewed socioeconomic distri- 
butions are so tenacious and powerful that 
any attempt to hammer them artificially 
flat runs the risk of ending in failure and 
disaster. The problem might lie in the 
relationship among the various skewed 
distributions, and for libraries the serials 
problem boils down to the following: It is 
known that not only are the prices of seri- 
als highly skewed, but so are the measures 
of their quality and utility, such as citations 
and library circulation. If the prices of 
journals are highly correlated with the 
measures of their quality and utility, then 
libraries are in a locked system, and any 
serials-cancellation project must fail. This 
is because the library will be forced to 
keep the 20% of the serials that consume 
80% of the serials budget, and — if the 
correlations are high enough — it is theo- 
retically possible to cancel the entire zero 
or random class or up to 40% of the col- 
lection without saving a penny in subscrip- 
tion costs. There is anecdotal evidence 
that this might be the case. Dougherty and 
Johnson (1988, 29) used the European 



Journal of Pharmacology as an example of 
a commercial publisher's raising the price 
of a periodical with a strong citation im- 
pact factor, and a survey of ARL directors 
by the Journal of Academic Librarianship 
evoked the following response (Dougherty 
and Barr 1988, 8): 

Every study we've done or seen indicates 
that high cost and high use are linked; and 
this limits our power to drop expensive 
journals, even where cooperation is as- 
sured. The publishers know what they are 
doing when they price their core journals. 
It is my intention to explore the relation- 
ship among the various skewed distributions 
composing the library market for scientific 
journals, using chemistry as a test case. 

The Database 

The starting point for the construction of 
a database to analyze the library market 
for chemistry journals was a survey of the 
faculty of the Louisiana State University 
Department of Chemistry on their serials 
needs. This survey was conducted in April 
1993 as a pilot study for a serials-cancella- 
tion project. Twenty-five persons, or 
roughly 71% of approximately 35 profes- 
sors and instructors, responded to the sur- 
vey. Here it should be emphasized that 
only the Department of Chemistry was 
surveyed; the Departments of Biochemis- 
try and Chemical Engineering were not 
included in the pilot study. This omission 
will later be seen to have had statistical 
consequences. It should also be noted that 
there were organizational connections be- 
tween the faculties of the Departments of 
Chemistry and Biochemistry. One person 
served as distinguished professor in both 
departments, while an associate professor 
in the Department of Chemistry was also 
a member of the adjunct faculty of the 
Department of Biochemistry. 

In the survey, members of the chemis- 
try faculty were asked to identify those 
serials important to them for research and 
teaching purposes from the entire serials 
universe, without restricting themselves 
to the ones on subscription at LSU. The 
first thing that was done with the sample 
of serials resulting from this request was 
to counteract the effects of Garfield's law 
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of concentration within it by restricting it 
in terms of subject coverage. Garfield 
(1979, 21-23) presents his law by pictur- 
ing the journal literature of a discipline as 
a comet. In this depiction the nucleus of 
the comet represents the core of the rela- 
tively few journals that publish the over- 
whelming majority of the material on the 
discipline important enough to be cited, 
whereas the tail of the comet is the expo- 
nentially increasing number of journals 
publishing an ever-decreasing quantity of 
significant papers on the subject. How- 
ever, according to Garfield, there is a con- 
siderable amount of disciplinary overlap, 
and his law of concentration states the 
"the tail of the literature of one discipline 
consists, in large part, of the cores of the 
literatures of other disciplines." In his 
opinion, this overlap is so great that the 
interdisciplinary core for all science disci- 
plines involves no more than 1,000 jour- 
nals and perhaps as few as 500. 

Garfield's law of concentration con- 
fronts the researcher with two major sta- 
tistical problems closely related to each 
other. First, serials from disparate disci- 
plines diff er markedly from each other in 
such quantitative measures as citation 
rates, library usage, price, and size. There- 
fore, mixing journals from different 
disciplines in the same sample nullifies 
significant statistical relationships. This 
phenomenon was demonstrated by 
Stankus and Rice (1982) and Rice (1979) in 
analyses of the correlations between SCI-ci- 
tation frequency and scientific-journal usage 
at the State University of New York at Albany 
(SUNYA). In their work they showed that 
•whereas no significant correlations were 
found when SUNYA usage was tested 
against SC/-citation frequency on a global 
basis, i.e., for science as a whole without 
regard to individual disciplines, excellent 
and good correlations emerged between 
these two variables as soon as the journals 
were segregated according to subject, 
scope, purpose, and language. 

The second major statistical problem 
resulting from Garfield's law of concentra- 
tion is that it is virtually impossible to 
obtain an uncontaminated sample of seri- 
als from a single scientific discipline. This 
derivative from Garfield's law is evident in 



the work of ISI on the classification of 
journals into subject categories. Each year 
the institute applies cocitation and cluster 
analysis to its database to map the discipli- 
nary topology of science (Small and 
Garfield 1985), and it often places serials 
into more than one of the subject catego- 
ries classifying the journals covered by the 
SCI. A statistical consequence of the in- 
terdisciplinary nature of science com- 
bined with the highly skewed distribu- 
tions of its measures is the inherent risk of 
an extreme outlier in a data set or — in the 
definition of Barnett and Lewis (1984, 
4) — "an observation (or subset of observa- 
tions) which appears to be inconsistent 
with the remainder of that set of data." 
These outliers might often be the result of 
a sample of journals from one subject dis- 
tribution containing contaminants from 
another subject distribution (Barnett 
1978; Barnett and Lewis 1984, 1-44). 
Due to the operation of Garfield's law of 
concentration, such outliers cannot be ex- 
cluded on logical grounds, and it is only 
possible to explain — where feasible — 
their effect on the statistical results. 

The LSU chemistry faculty certainly 
f ollowed the dictates of Garfield's law of 
concentration in their responses to the 
serials survey, selecting journals in numer- 
ous ISI subject categories. Among the ISI 
subject categories of the journals chosen 
by them were the f ollowing: Engineering, 
Electrical, and Electronic; Environ- 
mental Sciences; Geosciences; Materials 
Science, Ceramic; Nutrition and Dietet- 
ics; Physics; and Radiology and Nuclear 
Medicine. As an example of the statistical 
difficulties possible from retaining all the 
titles, one of the 25 respondents picked 
the prestigious New England Journal of 
Medicine — a result that certainly would 
have been different had the 25 respon- 
dents been medical doctors. To control for 
the effects of Garf ield's law of concentra- 
tion, it was first decided to restrict the 
sample to those titles selected by the LSU 
chemistry faculty and classified by ISI in 
the general subject category Chemistry. 
However, because there was not enough 
overlap to create a viable sample, one was 
forced to run the increased risk of con- 
taminants and extend the sample to all 
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branches of chemistry, including chemical 
engineering and crystallography. The SCI 
subject class Spectroscopy was also in- 
cluded due to the emphasis of the LSU 
Department of Chemistry on it, even 
though this discipline is generally consid- 
ered part of optics within physics. 

The titles chosen for inclusion in the 
database were then sub jected to technical 
analysis via the OCLC Online Computer 
Library Center, Inc., cataloging system in 
order to clarify their 1993 compositional 
status and their history. With respect to 
compositional status the main goal was to 
check whether a given serial title con- 
sisted of one unit or was divided into sec- 
tions. The purpose of historical analysis 
was to trace the various title changes, di- 
visions into sections, and combinations 
into units of a serial back to the year of its 
establishment. The primary determinant 
of whether a serial remained the same 
publication through all these vagaries was 
the consistency and continuity of the vol- 
ume numbering. During the course of the 
data collection, it became necessary to 
establish a policy of aggregating all the 
sections of a serial into one unit. Thus, the 
five sections of the Journal of the Chemical 
Society — Chemical Communications, Dal- 
ton Transactions, Faraday Transactions, 
Perkins Transactions 1, and Perkins Trans- 
actions 2 — were treated as a single entity in 
terms of statistical measures. The result 
from the preceding steps was a serials data- 
base containing 154 observations. 

Three quantitative variables were em- 
ployed to measure the scientific value of 
the serials in the database. The first was 
called faculty score, and it was developed 
from information provided by the respon- 
dents to the April 1993 serials survey of 
the LSU Department of Chemistry. In 
this survey the chemistry f aculty members 
were asked to prioritize their serials needs 
by identifying the titles important to them 
and dividing these titles into the three 
following groups: (1) those titles used fre- 
quently enough for teaching purposes to 
be needed on campus; (2) those titles used 
frequently enough for research purposes 
to be needed on campus; and (3) titles for 
both teaching and research that could be 
located off campus and satisfactorily ac- 



cessed through a rapid document delivery 
service. Within each group the faculty 
members were requested to limit them- 
selves to ten titles, and for the first two 
groups they were asked to rank the titles 
in descending order of importance from 1 
to 10. The faculty members also estimated 
the frequency with which they thought 
the titles would be used. 

Inspection of the responses to the 
April 1993 survey did not reveal whether 
the LSU chemistry faculty as a whole re- 
garded teaching or research as more im- 
portant with respect to serials. As a result, 
it was decided to ignore this distinction, 
regroup the titles as to whether they were 
needed on campus or could be located off 
campus, and eliminate any double count- 
ing of titles by individual faculty mem- 
bers. Then each title was assigned 10 
points for every faculty member who 
chose it and another 10 points for every 
f aculty member who wanted it on campus . 
If a title was placed in the off-campus 
group by a faculty member, it was given 
no extra points. The titles were also allo- 
cated points on how each faculty member 
ranked them, with 10 points given every 
rank of 1 , down to 1 point given every rank 
of 10. If a faculty member chose more 
than ten titles and ranked titles 11 and 
lower, these titles were given 10 points for 
being chosen but no rank points. Finally, 
titles were assigned points on the faculty 
estimates of the frequency with which 
they would be used: 10 points for each 
faculty estimate of monthly or more often; 
5 points for each faculty estimate of less 
than monthly up to yearly; and 1 point for 
each estimate of yearly or less often. Fac- 
ulty members usually did not distinguish 
among the different sections of a serial, but 
where they did, the title was given the higher 
of the sectional scores. Moreover, where a 
faculty member scored a title twice — once 
for teaching, once for research — the serial 
was given the higher of the two scores if 
these were different. Under this system the 
highest number of points a f aculty member 
could give a title was 40, and the maximum 
score a title could achieve was 1,000. The 
Journal of the American Chemical Society 
came closest to this maximum with a faculty 
score of 755. 
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The remaining quantitative variables 
for establishing the scientific value of a 
serial were two citation measures, total 
citations and impact factor, taken from the 
1993 Science Citation Index Journal Cita- 
tions Reports (SCI JCR 1993). However, 
before one can fully understand these 
variables, it is necessary to understand the 
ISI concept of a source item, which is a 
research article, review article, or techni- 
cal note published in any of the journals 
covered not only by the SCI but also by 
ISI's other two indexes: the Social Sci- 
ences Citation Index (SSCI) and the Arts 
b- Humanities Citation Index (Ab-HCI). 

With the source item concept in mind, 
the variable total citations can be defined 
as the total number of references received 
by a serial in the database from Source 
Items processed by ISI for the SCI, SSCI, 
and AirHCI. Given Garfield's law of con- 
centration, this variable can be regarded 
as measuring the importance of a serial to 
all fields of human knowledge. Here it 
must be noted the SCI JCR does not com- 
bine journal citation counts on the basis of 
"lineage" except where a title change does 
not af fect a journal's alphabetical position, 
nor does it combine the citation counts of 
the different sections of a journal (SCI 
JCR 1993, 7). However, for the total cita- 
tions measure utilized in this paper, it was 
decided to aggregate the counts of a peri- 
odical's sections and their backfiles due to 
the following reasons: (1) the LSU chem- 
istry faculty usually did not distinguish 
among the different sections of a journal; 
(2) it was desired to capture the full his- 
torical significance of a journal; and (3) 
the complex divisions and recombinations 
of a serial over its past often made it im- 
possible to allocate its historical citations 
among its present sections. The variable 
impact factor represents an attempt by ISI 
to create a normalized measure of value 
by controlling the citation frequency of a 
serial for age and size. This is done by 
limiting the backfile of a serial to the two 
years preceding the processing year of the 
JCR and then dividing the references to 
this two-year backfile by the number of 
Source Items in it to create an average 
citation rate per article. When required by 
the policy of aggregating journal sections 



and their backfiles into single units, the 
necessary adjustments were made to the 
appropriate impact factors in the 1993 
SCI JCR. 

Besides measures of scientific value, 
the database constructed for this article 
also contains a quantitative variable estab- 
lishing the economic worth of the serials 
in it. This variable was simply called 
"price" and was the subscription price 
paid in U.S. dollars during 1993 by insti- 
tutions in the U.S. international area. 
Where all the sections of a serial were 
offered in a package deal, the package 
price was used due to the policy of section 
aggregation. For the most part the prices 
were taken from the 1993 Faxon Guide to 
Serials, which was supplemented — when 
necessary— by the 1993-94 EBSCO Li- 
brarians' Handbook and the 1993 Swets 
Serials Catalogue, as well as by the 1993 
and 1994 Books in Print. One commercial 
publisher had no standard listings for its 
journal prices, which were only found in 
Dutch guilders at the back of the Swets 
Serials Catalogue; they were converted 
into U.S. dollars at the exchange rate at 
the close of the first day of business of 
1993 as published in the Wall Street Jour- 
nal. 

To complete the database for this 
study, variables were developed for mea- 
suring certain characteristics thought to 
be important with respect to the scientific 
and economic value of serials. Three of 
these variables were quantitative variables 
like the preceding ones and can be briefly 
described. First, there is one called 
"source items," which was intended as a 
measure of the size of the serial. Defined 
by ISI, it is the number of research arti- 
cles, review articles, and technical notes 
published in the database's periodicals 
during 1993. The data for thus variable 
were obtained from the 1993 SCI JCR. 
The second variable was called "journal 
age," and its purpose is clear from its 
name. This variable was derived by having 
the computer subtract from 1993 the year 
of the periodical's establishment found 
during the historical analysis of the title 
via the OCLC cataloging system. The 
third variable was called "libraries hold- 
ing," and it, too, was acquired from 
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OCLC, which during operating year 
1993-94 had 18,168 member libraries in 
61 countries and territories (OCLC n.d., 
6, 26). Each OCLC cataloging record lists 
the number and abbreviations of the li- 
braries holding the item, and the key seri- 
als records are in the successive entry 
form where a new record is input every 
time the title changes. The libraries hold- 
ing data were taken from information on 
the catalog record for the title segment of 
the serial current in 1993 after carefully 
screening for duplicate records and li- 
brary holdings. Where different concur- 
rent sections of a serial were unevenly 
held by libraries, the serial was given the 
highest libraries holding number upon ag- 
gregation of the sections into one obser- 
vation. Even though some of the listings 
might represent subscriptions canceled 
by libraries and the overwhelming major- 
ity of the listed libraries were located in 
the United States despite OCLC s claims 
to international coverage, the libraries 
holding variable is considered a good esti- 
mate of the library market for these peri- 
odicals. Given the astronomical costs in- 
volved, libraries must represent the vast 
bulk of the market for most of these chem- 
istry journals. 

The final two variables in the serials 
database for this paper are qualitative or 
categorical variables intended to describe 
the nature of the publishers of the chem- 
istry periodicals. Of these variables tile 
first was called "publisher type" and de- 
notes whether a given seria! was issued by 
an association or by a commercial enter- 
prise. The second was named "country of 
origin," and it designates whether the 
publisher was located in the United States 
or was foreign. Information for the quali- 
tative variables was obtained from the 
same sources as price. With one Canadian 
and a few Japanese exceptions, all the 
foreign publishers were Western Euro- 
pean. 

The Distribution of the 
Quantitative Variables 

Examination of table 1 reveals that all the 
quantitative variables in the database are 
highly skewed in the pattern customary 



for library data. To construct this table, 
the observations for each variable were 
first arrayed in descending order and then 
divided into four classes whose limits were 
defined by the highest observation, the 
quartiles, and the lowest observation. By 
definition each class thus contains ap- 
proximately 25% of the titles measured hy 
the variable under consideration. In every 
case class 1, containing the observations 
with the highest values, accounts for more 
than half of the variable total, ranging 
from 51% of the total ages of the serials to 
80.2% of the total citations received by 
them. This percentage uniformly de- 
creases for every variable as one descends 
the order, until class 4, with the lowest 
values, is responsible for a relatively min- 
ute portion of the variable totals, extend- 
ing for 1.8% of all the citations given the 
serials in the sample to 8.9% of the librar- 
ies holding them. 

Given the results summarized in table 
1, it is not surprising that Shapiro-Wilk 
tests resoundingly rejected (p = 0.0001) 
for all variables the null hypothesis that 
the sample data were drawn from nor- 
mally distributed populations. This is a 
matter of great concern, because mark- 
edly non-normal data might lead to incor- 
rect conclusions in inferential statistical 
analyses as well as have a biasing effect on 
correlation coefficients and the more so- 
phisticated procedures based upon such 
coefficients (Hatcher and Stepanslci 1994, 
110). Many statistical techniques are 
based upon the assumption of normal dis- 
tributions, and, when dealing with data 
such as those of the chemistry serials, the 
researcher is confronted with a basic de- 
cision: either (1) rely upon nonparametric 
procedures, which are distribution free 
and resistant to outliers, or (2) prepare the 
variables for more powerful parametric 
treatment through their proper mathe- 
matical transformation. It was decided to 
opt for the latter course. 

The first step in deciding upon the 
proper transformation was to analyze the 
frequency distributions of the variables 
to determine whether they matched any 
single probability distribution. As part of 
this process, histograms of the variables 
were constructed, and they all turned out 
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TABLE 1 

Percentage Distribution of Quantitative Variables 
over Classes Defined by Quartiles 



Class i 



Class 2 



Class 3 



Class 4 



Faculty Score 

Quartile limits 

Percentage of 
variable total 

Total Citations 

Quartile limits 

Percentage of 
variable total 

Impact Factor 

Quartile limits 

Percentage of 
variable total 

Price 

Quartile limits 

Percentage of 
variable total 

Source Items 

Quartile limits 

Percentage of 
variable total 

Journal Age 

Quartile limits 

Percentage of 
variable total 

Libraries Holding 

Quartile limits 

Percentage of 
variable total 



111-755 
62.5 

11,685-231,324 
80.2 

3.018-37.885 
61.9 



50-110 

20.6 

3,303-11,586 
13.4 

1.730-2.952 
19.4 



1,360.00-9,563.87 725.00-1,350 00 



62.0 

563-3,916 
67.8 

39-161 
51.0 

528-1,728 
51.6 



21.0 

261-551 
20 1 

28-38 
22.7 

319-519 
23.4 



33-50 
11.1 

1,533-3,285 
4.6 

1.049-1.697 
12.2 

408.00-715.00 
11.9 

100-257 
9.2 

21-28 
17.6 

229-318 
16.2 



10-32 
5.8 

255-1,526 
1.8 

0.111-1.035 
6.4 

46.00-402.00 
5.1 

5-96 
2.9 

3-21 
8.8 

55-225 
8.9 



Titles are arrayed in descending order, and each Title, Class contains approximately 25% of the titles. Each 
variable has 154 titles except for Source Items, which has 151 titles 



remarkably similar. Interestingly enough, 
they closely resembled the histogram pre- 
sented by Lotka (1926) in his seminal 
paper in bibliometrics on the frequency 
distribution of scientific productivity. In 
his paper, Lotka noted that frequency dis- 
tributions of this general type have a wide 
range of applicability to a variety of phe- 
nomena. The histogram for faculty score 
is shown in Figure 1 as a typical example 
of those found for all the variables in the 
chemistry-serials data set. It should be 



pointed out that the isolated bar at 755 is 
not an outlier but represents the Journal 
of the American Chemical Society and, 
thus, the essence of the entire system. 

Skewed distributions are extremely 
common in nature, and for determining 
which probability distribution describes 
the frequency distributions of the vari- 
ables in the chemical-serials database, a 
statistical manual was utilized. This man- 
ual was developed by the British zoologist 
Elliott (1977) for analyzing samples of 
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benthic invertebrates gathered in the 
English Lake District. On the basis of 
Elliott's work it is possible to posit three 
basic ways in which phenomena distribute 
themselves over observations. One of 
these ways is random distribution, 
whereby there is no regularity to the man- 
ner in which the phenomena distribute 
themselves. The random pattern is best 
described by the Poisson distribution, and 
its distinguishing statistical characteristic 
is that the variance equals the mean. An- 
other distributional model is regular dis- 
tribution, and here the phenomena tend 
to disperse themselves evenly or uni- 
formly over the observations. This model 
is described by the positive binomial dis- 
tribution, whose distinguishing statistical 
characteristic is that the variance is less 
than the mean. The final theoretical way 
in which phenomena can arrange them- 
selves is contagious distribution. With 
contagious distribution the phenomena 
concentrate themselves on a relatively few 
observations, and the chances are that 
where you find one example ol a phe- 
nomenon, you will find another. There are 
diverse patterns of contagious distribu- 
tions, and a number of mathematical mod- 
els have been put forward to describe 
them. The most useful of such models is 
the negative binomial distribution, and its 
distinguishing statistical characteristic is 
that the variance is greater than the mean. 
With respect to bibliometrics, Price 
(1976) considered the negative binomial 
distribution to be descriptive of the Mat- 
thew Effect propounded by Merton 
(1968) as an explanation of the manner in 
which rewards are allocated among scien- 
tists. Derived from the Gospel according 
to St. Matthew (13:12)— "For to those 
who have, more will be given, and they 
will have an abundance; but from those 
who have nothing, even what they have 
will be taken away" — the Matthew Effect 
embodies a system of cumulative advan- 
tage that appears to be operative in the 
highly skewed distributions of authorship, 
citations, library usage, etc. (Bensman 
1982; Bensman 1985). 

To determine which probability distri- 
bution was proper for the variables in the 
chemistry-serials database a chi-square 



test was used to see whether the condition 
of the Poisson distribution was met. The 
null hypothesis was set that the ratio of the 
variance divided by the mean, or WM 
ratio, equaled 1, and for every variable this 
null hypothesis was rejected with a zero 
probability that this was so. This result 
established that all the V/M ratios were 
significantly dif ferent from 1, and inspec- 
tions of these ratios showed that every one 
was greater than 1, disproving the possi- 
bility of the positive binomial distribution. 
Because all the variances were demon- 
strably greater than their respective 
means, it was decided that the chemistry- 
serials variables were probably following 
the negative binomial distribution. Ac- 
cording to Elliott (1977, 33), the proper 
transformation for such variables is the 
logarithmic transformation, and in this 
paper log e or In is utilized when such 
transformations are required. This policy 
is in conformance with the advice of the 
late Charles Winsor, who frequently pre- 
scribed the logarithmic transformation of 
all natural counts — no matter what the 
source — before their analysis, because 
the number of times the prescription 
harms the patient are few in comparison 
to the cures (Acton 1959, 223). 

The Measurement of 
Scientific Value 

In this approach to the measurement of 
scientific value, value is postulated as a 
construct of the human mind. Therefore 
value follows Bishop Berkeley's maxim 
that "to be is to be perceived." The thing 
being evaluated might possess objective 
attributes that might affect its subjective 
evaluation, but the final arbiter in matters 
of value is the human mind. The viewpoint 
adopted in this paper was succinctly stated 
by Cartter (1966, 4), who directed the 
1964 American Council on Education 
(ACE) assessment of quality in U.S. 
graduate education. Defending peer rat- 
ings as a proper methodology for assessing 
the quality of educational institutions, 
Cartter declared quality to be "an elusive 
attribute, not easily subjected to measure- 
ment." According to him, no single in- 
dex — be it size of endowment, number of 
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books in the library, publication record of 
the faculty, etc. — nor any combination of 
such measures is sufficient to estimate 
adequately the true worth of an educa- 
tional institution. Cartter stated that such 
"objective" measures of quality are for the 
most part "subjective" measures once re- 
moved, and he concluded, "In an opera- 
tional sense, quality is someone's subjec- 
tive assessment, for there is no way of 
objectively measuring what is in essence 
an attribute of value." 

From this perspective, of the three 
variables in the chemistry-serials database 
for measuring scientific value, faculty 
score is the crucial one, and the other 
two — total citations and impact factor — 
must be judged by their relationship to 
faculty score. To do this, the Pearson cor- 
relation coefficient of faculty score with 
both total citations and impact factor was 
computed after the logarithmic transfor- 
mation of all the variables recommended 
by Elliott (1977, 33, 102) was carried out. 
Then the correlations were treated as uni- 
variate regressions, and the data were ana- 
lyzed for outliers and inappropriate influ- 
ential observations, i.e., observations 
crucial in determining the slope of the 
regression line. In both cases faculty score 
was made the dependent variable, be- 
cause it was assumed to have the most 
error in it. After the proper exclusions the 
correlations were then recomputed. 

This procedure revealed a strong cor- 
respondence of faculty score to total cita- 
tions. The first Pearson product-moment 
correlation coefficient between the two 
variables was 0.66. Analysis of the residu- 
als turned up five outliers, of which four 
had low faculty scores with respect to their 
total citations. Of these outliers, two can 
be attributed at least partly to Garfield's 
law of concentration and the omission of 
the LSU Department of Biochemistry 
from the faculty survey, because these ti- 
tles were classified by ISI in Biochemistry 
and Molecular Biology. The fifth outlier 
was the Journal of Chemical Education, 
and its faculty score was high with respect 
to its total citations. This title ranks high- 
est on the variable libraries holding and is 
generally read more for information than 
cited for research. When all five outliers 



were excluded, the Pearson product-mo- 
ment correlation coefficient between fac- 
ulty score and total citations rose to 0.72. 

All in all, it was a remarkable perform- 
ance, for, on one side of the bivariate re- 
lationship, there was a small, nonrandom 
sample of local chemistry faculty and, on 
the other, a good proportion of the entire 
universe of publishing chemists. The re- 
sults validate the practice of utilizing local 
faculty for collection development pur- 
poses. Moreover, these results corrobo- 
rated an earlier finding made by this re- 
searcher (Bensman 1985, 22) correlating 
the total SSCI citations received by eco- 
nomics departments with the peer ratings 
of these departments by a scientifically 
selected sample of economics professors 
in the 1981 assessment of U.S. doctoral 
programs sponsored by the Conference 
Board of Associated Research Councils. 
The Pearson product-moment correlation 
coefficient was a stunning 0.92. Together, 
these two findings confirm the hypothesis 
that faculty score and total citations are 
just two different measures for the same 
variable of scientific value. 

However, it is a different story when it 
comes to the relationship of faculty score 
to impact factor. In this case the initial 
Pearson product-moment correlation co- 
efficient was a mere 0.25. There were two 
outliers, and, needless to say, confidence 
in impact factor as a measure of scientific 
value was not increased by the discovery 
that one of these outliers was the Journal 
of the American Chemical Society. When 
the two outliers were excluded, the Pear- 
son product-moment correlation coeffi- 
cient between faculty score and impact 
factor rose to only 0.27. These results 
were somewhat disconcerting as ISI gives 
considerable prominence to impact factor 
by devoting an entire section of its JCR to 
listing journals in descending order by this 
measure within subject categories. More- 
over, impact factor is commonly used as a 
value measure in studies of journal prices 
(Baldwin and Baldwin 1989; Barschall 
1988; Moline 1991; Nisonger 1993; Ribbe 
1988). 

Analysis of the possible causes for the 
relatively low correlation between faculty 
score and impact factor uncovered three 
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possible explanations for this phenome- 
non. As has been explained, impact factor 
represents an attempt by ISI to create a 
normalized measure of value by control- 
ling the citation frequency to a journal (or 
size and age to create an average citation 
rate per article over a recent time period. 
However, this procedure brings into play 
an exogenous variable that relates to the 
average citation rates of different types of 
articles and might have nothing to do with 
peer perceptions of value. Literature re- 
view articles have approximately double 
the impact factor of normal journal arti- 
cles (Moed and Van Leeuwen 1995, 463, 
table 1), and this type of article might be 
consulted by scientists more for conven- 
ience than for new, significant findings. 
The SCI JCR for 1993 has a section enti- 
tled "Source Data Listing" that presents 
data on the composition of the source 
journals in terms of review and nonreview 
articles. This section was utilized to deter- 
mine the nature of the top fifteen titles of 
the chemistry-serials database in impact 
factor. These titles represent 9.7% of the 
titles in the database but account for 
40.1% of the aggregate impact factor of 
the serials in the database. The "Source 
Data Listing" had information on twelve 
of these titles. Of these twelve titles, six 
were review journals consisting of 100% 
review articles, two were overwhelmingly 
review journals containing respectively 
92.9% and 81.3% review articles, and two 
were half review journals consisting re- 
spectively of 55% and 41.2% review arti- 
cles. Review articles represented an insig- 
nificant proportion of the final two titles. 

The other two possible explanations 
for the relatively low correlation between 
faculty score and impact factor relate to 
the very nature of the controls employed 
by ISI to create the latter measure. Con- 
cerning size, academic perceptions of 
quality or value appear to be greatly and 
positively influenced by this objective at- 
tribute. This phenomenon was observed 
in the two assessments of U.S. doctoral 
programs sponsored in 1981 and 1993 by 
the Conference Board of Associated Re- 
search Councils. The 1981 assessment 
(Jones, Lindzey, and Coggeshall 1982) 
found that in all subject fields peer per- 



ception of program quality correlated 
positively with measures of program size 
in terms of number of faculty members, 
students, and recent graduates, noting 
that the larger the program, the more 
likely its faculty was to be rated high in 
quality. Moreover, in the social sciences 
the 1981 assessment discovered that the 
influence of size also was operative with 
regard to faculty publications. For seven 
academic fields, peer ratings of the quality 
of program faculty members correlated 
highly with total articles attributed to 
these faculty members in journals covered 
by the SSC7 (0.71 in Political Science to 
0.80 in Sociology). However, as soon as 
the publication measure was corrected for 
size by reporting the fraction of program 
faculty members with one or more arti- 
cles, the correlations with peer ratings 
dropped markedly to a range from 0.26 in 
Anthropology to 0.59 in Geography. 

The findings of the 1981 assessment on 
the role of size in academic perceptions of 
quality were confirmed by the 1993 as- 
sessment (Goldberger, Maher, and Flat- 
tau 1995). In the 1993 assessment it was 
found that two basic groups of variables 
correlated strongly with peer ratings of 
the scholarly quality of program faculty: 
(1) "size" as defined by number of faculty, 
students, and graduates and (2) "level of 
faculty research and scholarship" as mea- 
sured by publications, citations, and 
grants. The 1993 assessment noted that 
the strong positive correlations between 
the size of a faculty in a program and its 
reputational standing have not been thor- 
oughly explored. Nevertheless, the same 
process at work in the 1981 and 1993 
assessments of U.S. doctoral programs ap- 
peared also to be active with respect to the 
titles in the chemistry-serials database. 
After being subjected to the same proce- 
dures as the correlations of faculty score 
with total citations and impact factor, the 
Pearson coefficient of faculty score with 
the size measure source items turned out 
to be 0.59 upon the exclusion of three 
outliers. 

To investigate the other control im- 
posed by ISI to create the measure impact 
factor, the effect of serial age on peer 
ratings was analyzed. This analysis was 
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also intended to test the assumption that 
the faculty would esteem more highly the 
older, more established journals. Serial 
age appears to influence peer ratings 
much less than serial size. When Faculty 
score was correlated with journal age in 
the manner outlined above, the initial 
Pearson correlation coefficient was 0.24. 
There proved to be three outliers, but, 
unlike the preceding examples, all the 
oudiers were also found to be influential 
observations able to affect the correlation 
coefficient out of proportion to their rep- 
resentation in the database. Upon investi- 
gation the three influential outliers turned 
out to be European journals over one hun- 
dred years old with the lowest possible 
faculty score of 10 and titles in a foreign 
language. Needless to say, their faculty 
scores were low in comparison to their 
total citations, further proving that these 
titles were not in the mainstream of U.S. 
chemistry. When excluded from the com- 
putations, the Pearson product-moment 
correlation coefficient between faculty 
score and journal age increased from 0.24 
to 0.34, or by more than 41%. 

Further investigation of the effect of 
serial age and size on peer ratings was 
done hy regressing faculty score on source 
items and journal age. During the analyses 
conducted to decide upon the final form 
ol the regression equation, there were dis- 
covered not only three outliers related to 
extremes in source items but also five in- 
fluential observations that were deemed 
inappropriate even though they were not 
outliers. Three of these influential obser- 
vations had appeared as inlluential out- 
liers in the correlation of faculty score 
with journal age above, whereas the other 
two inlluential observations belonged to 
the same category, being European and 
over one hundred years old. Because the 
five inlluential observations appeared to 
he more in the European than in the 
American tradition of chemistry, it was 
decided to exclude them along with the 
three outliers related to size. With this 
done, the total variance in faculty score 
caused by its regression on source items 
and journal age was 46%. However, there 
was a considerable amount of overlap in 
the effect of size and age. source items by 



itself accounted for 44% of the variance in 
faculty score and 31% of the variance over 
and above that of journal age, whereas 
journal age by itself accounted for 15% of 
the variance in faculty score but merely 
2% of the variance over and above that of 
source items. These results seem natural, 
because part of die size of a journal is its 
backlile, which is also an expression of its 
age. Due to all these considerations, it was 
decided to reject impact factor as a valid 
measure of scientific value. 

The Structure of the Library 
Market for Chemistry Journals 

It was decided to begin the exploration of 
tile structure ofthe library market for chem- 
istry journals by comparing the major 
groups composing it. These groups are de- 
fined by the categorical variables publisher 
type and country of origin. As described 
above, the first of" these categorical variables 
divides die journals into association and 
commercial ones, whereas the second of 
these categorical variables divides them into 
U.S. and foreign journals. These groups 
were compared by testing whether there 
were significant differences between 
them in the means of the quantitative 
variables being used to measure the mar- 
ket. Given the highly skewed distributions 
of the quantitative variables, the non- 
parametric Wilcoxon rank sum test was 
performed to determine whether the dif- 
ferences between the means were signifi- 
cant. The results of the comparisons are 
summarized in table 2. With respect to 
publisher type, it can be seen that associa- 
tion journals are approximately half as ex- 
pensive, score 2.7 to 3.4 times higher on 
measures of scientific value, contain 2.2 
more articles, and are held by 2.1 times 
more libraries than commercial journals. 
Moreover, all the differences between the 
means are highly significant. However, 
the situation is not so clear-cut when it 
comes to country of origin. At first glance 
the U.S. journals appear to have the ad- 
vantage over the foreign ones, but closer 
inspection reveals that the differences be- 
tween the means are significant in only 
two cases: (1) price, U.S. journals are half 
as expensive, and (2) libraries holding, 
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U.S. journals are held by 1.6 more librar- 
ies. Even the latter advantage seems to 
dissipate when it is remembered that U.S. 
libraries form the overwhelming bulk of 
the members of the OCLC network, from 
which the libraries holding data were 
taken. 

For this paper the major tool to analyze 
the structure of the librarymarket for chem- 
istry journals was a general linear regression 
equation, which was preliminarily specified 
in the following two models: 
Model 1: 

price = bo + bi(publisher type) 

+ b2(country of origin) + b3(faculty 

score) + b4(source items) + b5(libraries 

holding) 

Model 2: 

price = bo + bi(publisher type) 

+ b2(country of origin) + b3(total 

citations) + b4(source items) 

+ b5(libraries holding) 

The two models are identical except 
that in model 1 faculty score is used as the 
measure of scientific value, whereas in 
model 2 total citations serves as the mea- 
sure of scientific value. Because faculty 
score and total citations were considered 
in theory as basically equivalent measures 
and were highly correlated, it was ex- 
pected that their effects would be similar. 

Both models have the purpose of de- 
termining the role of publisher type, na- 



tional origin, scientific value, size, and the 
number of copies sold to libraries in the 
pricing of chemistry journals. As a result of 
the literature review and preliminary data 
exploration, the hypotheses were set that 
commercial journals would cost more than 
association ones and that foreign seriaLs 
would be priced higher than domestic ones. 
Therefore, the dummy variables publisher 
type and country of origin were coded in 
such a way that their coefficients would be 
positive if these hypotheses were true. 
Moreover, it was also posited that scientific 
value as measured by faculty score and 
total citations would positively aff ect prices 
even if only because it seemed to be in the 
publishers' self-interest to make as inelas- 
tic as possible the library demand for the 
more expensive journals. As a matter of 
fact, it was even considered likely that sci- 
entific value would play such a role in jour- 
nal pricing that libraries would be trapped 
within the locked system described at the 
beginning of the paper. Size as measured 
by source items in terms of numbers of 
citable units was also thought to have a 
positive effect on prices, because larger 
journals were deemed more costly to pro- 
duce. On the other hand, the variable li- 
braries holding was hypothesized to have a 
negative regression coefficient as the cost 
per copy was assumed to decrease in line 
with the number of copies able to be 
printed and sold. 



TABLE 2 

Comparison of Variable Means by Publisher Type and Country of Origin 



Means 





Price 


Faculty 
Score 


Total 
Citations 


Source 
Items 


Libraries 
Holding 


Publisher Type 












Association N = 34 


720.71 


177 


28,218 


780 


719 


Commercial N = 1 19 


1,346.00 


66 


8,198 


353 


339 


P-value* 


0.0002 


0.0003 


0224 


0.0082 


0.0001 


Country of Origin 












United States N = 67 


747.56 


113 


16,727 


493 


541 


Foreign 2V = 87 


1,559.67 


75 


9,960 


424 


333 


P-value* 


0.0001 


0.8368 


0.9811 


0.9506 


0.0012 



"Difference between the means is statistically significant if p-value is below 05 



LRTS • 40(2) • The Structure of the Library Market /161 



Exploratory computer runs of both 
models of the general linear regression 
equation immediately revealed major 
problems with the equation's functional 
specification resulting from the highly 
skewed distribution of the variables. 
These problems were of such a nature as 
to cause the violation of a number of the 
classical assumptions of linear regression. 
First of all, examination of the plots of the 
residuals against the predicted values of 
the dependent variable price showed that 
the regression function was not linear, and 
the constantly accelerating increases of 
the prices of the journals in the sample 
suggested that some sort of exponential 
relationship existed between the depend- 
ent and independent variables of the 
equation. Moreover, the same residual 
plots revealed that the error terms did not 
have a constant variance but were hetero- 
scedastic, increasing as the dependent 
variable price increased. Heteroscedastic- 
ity is a danger inherent in studies involving 
the comparison of groups (Hardy 1993, 
53-56). Finally, tests revealed that the er- 
ror terms were not normally distributed. 
Due to these violations of the classical 
linear regression assumptions, it was nec- 
essary to perform a transformation in or- 
der to introduce additivity into the equa- 
tion as well as to stabilize the variability 
and normalize the distribution of the error 
terms (Acton 1959, 219-23). Elliott 
(1977, 33, 102-3) calls for a logarithmic 
transformation, and the most frequent 
procedure is to transform the dependent 
variable (Sokal and Rohlf 1981, 539-41). 
The variable price was accordingly sub- 
jected to the logarithmic transformation, 
and this conversion of the equation to the 
semilogarithmic form corrected the above 
violations of the classical regression as- 
sumptions. 

With the main methodological prob- 
lems solved, it was decided to analyze the 
structure of the library market for chem- 
istry journals in three phases. The first 
phase was to investigate the role of all the 
independent variables together in deter- 
mining price with the full regression equa- 
tion. In the second phase the chemistry 
journals were segregated by publisher 
type to determine whether association 



and commercial publishers operated dif- 
ferently in the pricing of their journals. 
The third phase was to separate the jour- 
nals by country of origin for the purpose 
of analyzing the pricing policies of U.S. 
and foreign publishers. In all phases both 
models — model 1 using faculty score and 
model 2 utilizing total citations — of the 
regression equations were run. The 
semilogarithmic form with the transfor- 
mation of the dependent variable was the 
proper functional specification in all 
cases. 

Multicollinearity was investigated, but 
none was found. Before the final com- 
puter runs, the residuals were again exam- 
ined for outliers needing to be excluded. 
Four serials appeared as outliers, of which 
three were so by subject due to omissions 
in the faculty survey. Two of these subject 
outliers were classed by ISI in Biochem- 
istry and Molecular Biology and appeared 
repeatedly, whereas one was classified in 
Chemical Engineering and emerged only 
once. The fourth outlier was an egre- 
giously priced commercial journal. It was 
a consistent outlier and formed the apex 
of the price distribution. 

The results of the three phases of the 
analysis of the structure of the library mar- 
ket for chemistry journals are summarized 
in tables 3A, 3B', and 3C. Before these 
results can be interpreted, the measure- 
ments employed in these tables need to 
be explained. In semilogarithmic equa- 
tions of the type utilized for this paper, 
proportional change is derived by taking 
the antilog of the regression coefficients 
and then subtracting 1. However, propor- 
tional change must be understood differ- 
ently depending upon the type of variable, 
publisher type and country of origin are 
intercept dummies. Their coefficients do 
not af fect the slope of the regression line, 
but — when transformed as above — mea- 
sure in proportional terms how much 
more or less across the board a group 
causes the dependent variable to be with 
respect to some reference group. On the 
other hand, the quantitative variables fac- 
ulty score, total citations, source items, 
and libraries holding have slope coeffi- 
cients in that they determine the slope of 
the regression line. When transformed as 
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above, these coefficients measure how 
much compound proportional change — 
as in compound bank interest — a one unit 
change in the independent variable causes 
in the dependent variable (Halversen and 
Palmquist 1980; Hardy 1993, 56-60; 
Stundenmund and Cassidy 1987, 6-15, 
41-44; Thornton and Innes 1989). 

Concerning the other table 3 measure- 
ments, beta weights are standardized mul- 
tiple regression coefficients, which are 
produced when the data are analyzed in 
standard score or z-score form. This de- 
notes that all the variables have been 
standardized to have a mean of and a 
standard deviation of 1. With this done, 
both the dependent and independent 
variables are measured on the same scale, 
and it is possible to interpret the absolute 
value of the beta weights as indicative of 
the relative importance of the inde- 
pendent variables in explaining the move- 
ments of the dependent variable. The 
measurement Uniqueness Index must be 
understood in terms of the concept R- 
square. R-square represents the propor- 
tion of variance in the dependent variable 



that is accounted for by the linear combi- 
nation of the independent variables. How- 
ever, the proportions of variance caused 
separately by each independent variable 
overlap each other, and the Uniqueness 
Index is the proportion of variance in the 
dependent variable that is accounted for 
by a given independent variable above and 
beyond the variance caused by the other 
independent variables in the regression 
equation (Hatcher and Stepanski 1994, 
395-408). 

With these explanations in mind, it is 
now possible to interpret the results of the 
analyses of the structure of the library 
market for chemistry journals summa- 
rized in table 3. Table 3A shows the results 
of the first phase in which all the jour- 
nals — association and commercial, U.S. 
and foreign — were analyzed together. 
With respect to the dummy variables, the 
reference group for publisher type is all 
association (U.S. and foreign) journals, 
whereas the reference group for country 
of origin is all U.S. (association and com- 
mercial) journals. As was expected, both 
model 1 with faculty score and model 2 



TABLE 3A 

Proportional Change, Beta Weights, and Uniqueness Indexes 
Obtained in Multiple Regression Analyses Predicting Price 

All Serials Together 



Independent Variables 



Proportional 
Change 



Beta Weights 



Uniqueness 
Indexes 



Model 1: Faculty Score Used for Scientific Value R-square = 


0.5987'; N = 147 




Publisher Type 


0.8495* 


0.2745* 


0.0477* 


Country of Origin 


0.1995 


0.0963 


0.0070* 


Faculty Score 


0.0014 


0.1553 


0.0092 


Source Items 


0.0012° 


0.6746* 


0.2615* 


Libraries Holding 


-0.0014* 


-0.4714* 


0.1103* 


Model 2: Total Citations Used for Scientific Value R-square 


= 0.5896'; N = 147 




Publisher Type 


0.7854° 


0.2588* 


0.0430* 


Country of Origin 


0.2000 


0.0966 


0.0070 


Total Citations 


-0.000001 


-0.0201 


0.0001 


Source Items 


0.0014* 


0.7579* 


0.1390* 


Libraries Holding 


-0.0012* 


-0.4020* 


0.0941* 



"Significant at the 0.05 level. 
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produced similar results. In neither case 
did scientific value play a significant role 
in the prices libraries paid for chemistry 
journals. Moreover, in both models coun- 
try of origin also did not play a significant 
role in determining price, but publisher 
type did. On the whole, libraries paid 85% 
more for commercial journals than for 
association ones with faculty score and 
79% more with total citations. The vari- 
ables source items and libraries holding 
acted as hypothesized. In both models 
libraries paid about 0.1% more com- 
pounded for each citable unit and ap- 
proximately 0.1% less compounded with 
every subscribing library. 

Examination of" the beta weights re- 
vealed a similar order of importance for 
the three significant independent vari- 
ables in each model. With both faculty 
score and total citations, size in terms of 
source items was the most determinant of 
price; number of copies sold in terms of 
libraries holding was the next most deter- 
minant; and publisher type was the (east 
determinant of price. The beta weights 
findings were confirmed by the unique- 
ness indexes. Both models had similar H- 
square measures, with model 1 account- 
ing for 60% of the variance in price and 
model 2 for 59%. The amount of unique 
variance accounted for by the three sig- 
nificant independent variables conformed 
to the order marked out by the beta 
weights. However, size in terms of source 
items accounted for almost double the 
unique variance in price in the faculty 
score equation than with the total cita- 
tions equation — 26% to 14%. 

The results of the first phase of the 
analysis of the library market for chemis- 
try journals were largely corroborated by 
the second phase of the analysis, whose 
findings are presented in table 3B, In the 
second phase the journals were segre- 

of journals were analyzed independently 
with both models of the equation. With 
respect to the dummy variable country of 
origin, in the association set the reference 
group was the U.S. association publishers, 
whereas in the commercial set the refer- 
ence group was the U.S. commercial pub- 
lishers. Without going into detail, it can be 



seen that in all cases neither national ori- 
gin nor scientific value — whether mea- 
sured by faculty score or total citations — 
played a significant roie in the prices paid 
by libraries for chemistry journals. More- 
over, in all cases size and num ber of copies 
sold to libraries acted in the hypothesized 
manner, with the former raising prices 
and the latter lowering prices. Finally, in 
all cases size was more important than 
number of copies sold to libraries in set- 
ting prices. When the results of the first 
phase are taken into consideration with 
those of the second phase, association and 
commercial publishers appear to have 
priced their chemistry journals in the 
same manner except that the commercial 
publishers priced theirs at a higher level. 
In the light of the findings of the third 
phase of the chemistry-journal market 
analysis, it is also important to note that 
U.S. commercial publishers acted no dif- 
ferently in their pricing policies than did 
foreign commercial publishers. 

The third phase of the analysis of the 
library market for chemistry journals proved 
to be the most illuminating in many re- 
spects. Its results are shown in table 3C. In 
the third phase the serials were divided into 
two sets by country of origin, and these sets 
were analyzed independently from each 
other with both models of the regression 
equation. Concerning the dummy variable 
publisher type, for the U.S. journals the 
reference group was U.S. association jour- 
nals, whereas for the foreign journals the 
reference group was foreign association 
journals. 

With respect to the U.S. set, the results 
produced by model 1 of the regression 
equation with faculty score and model 2 
with total citations were interesting for 
both their similarities and their differ- 
ences. As for their similarities, with both 
models the dummy variable publisher 
type was a significant determinant of 
price, and in each case libraries paid con- 
siderably more — 80% more with faculty 
score, 77% more with total citations — for 
U.S. commercial journals than for U.S. 
association ones. These figures are 
equivalent to those produced for publish- 
er type by the full regression equation in 
table 3A, and it shows that the prices of 
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TABLE 3B 

Proportional Change, Beta Weights, and Uniqueness Indexes 
Obtained in Multiple Regression Analyses Predicting Price 



Serials Segregated by Publisher Type 



Independent Variables 



Proportional 
Chance 



Beta Weights 



Uniqueness 
Indexes 



Association Serials 

Model 1: Faculty Score Used for Scientific Value R-square = 0.7506"; N = 34 

Country of Origin 0.4174 0.1512 0.0208 

Faculty Score 0.0008 0.1406 0.0070 

Source Items 0.0010* 0.8327* 0.3869* 

Libraries Holding -0.0009* -0.4410* 0.1050* 

Model 2: Total Citations Used for Scientific Value R-square = 0. 7466°; N = 34 

Country of Origin 0.3612 0.1337 0.0172 

Total Citations 0.000003 0.1145 0.0029 

Source Items 0.0009* 0.8019* 0.1<*83* 

Libraries Holding -0.0008* -0.3843* 0.1194* 

Commercial Serials 

Model 1: Faculty Score Used for Scientific Value R-square = 0.5381'; N =115 

Country of Origin 0.0700 0.0351 0.0011 

Faculty Score 0025 0.1657 0.0159 

Source Items 0.0014* 0.6352* 0.2557* 

Libraries Holding -0.0019* -0.4374* 0.1503* 

Model 2: Total Citations Used for Scientific Value R-square = 0.5222°; N = 115 

Country of Origin 0.1040 0.0513 0.0025 

Total Citations 0.0000004 0.0059 0.00001 

Source Items 0.0016* 0.7128* 0.1209* 

Libraries Holding -0.0017* -0.3935* 0.1048* 



'Significant at the 0.05 level. 

U.S. commercial publishers for chemistry 
journals were in line with those of foreign 
commercial publishers. 

Concerning their differences, with fac- 
ulty score as the measure, scientific value 
had a significant and positive effect on 
price , but this result was negated when 
total citations were employed as the meas- 
ure, and scientific value again reverted to 
having no significant influence on price. 
The variables source items and libraries 
holding yielded a mixed bag of similarities 
and differences. Both variables per- 
formed according to expectations as 
prices increased with size and decreased 



with number of copies sold to libraries. 
However, in model 1 libraries holding had 
both a higher beta weight and uniqueness 
index than source items, reversing the 
usual order of importance for these vari- 
ables, whereas in model 2 libraries hold- 
ing had a lower beta weight but higher 
uniqueness index than source items, pro- 
ducing a confusing picture. All in all, when 
the national origin of the faculty raters and 
the majority of the OCLC holding librar- 
ies are taken into account, this analysis of 
U.S. chemistry journals suggests that U.S. 
publishers might be somewhat attentive 
to opinions and needs of the U.S. aca- 
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TABLE 3C 

Proportional Change, Beta Weights, and Uniqueness Indexes in 
Multiple Regression Analyses Predicting Price 



Independent Variables 



Serials Segregated by Country of Origin 
Beta Weights 



Pic-portional 
Change 



Uniqueness 
Indexes 



U.S. Serials 

Model 1: Faculty Score Used for Scientific Value R-square = 0.5093°; N = 64 

Publisher Type 0.7992' 0.3158* 0.0728* 

Faculty Score 0.0035* 0.5229* 0.1004* 

Source Items " 0.0007* 5220* 0,1639* 

Libraries Holding -0.0018* -0.7532* 0.2673* 

Model 2: Total Citations Used for Scientific Value R-square = 0.5898°; N = 63 

Publisher Type 0.7725* 0.3152* 0.0739* 

Total Citations -0.000003 -0.0951 0.0017 

Source Items 0.0015* 0.8689* 0.1395* 

Libraries Holding -0.0013* -0.5705* 0.2168* 

Foreign Serials 

Model 1: Faculty Score Used for Scientific Value R-square = 0.5115°; N =84 

Publisher Type 0.8168 0.1625 0.0229 

Faculty Score 0.0010 0.0759 0.0030 

Source Items 0.0012* 0.7280* 0.3178* 

Libraries Holding -0.0010* -0.2050* 0.0256* 

Model 2: Total Citations Used for Scientific Value R-square = 0.5097°; N = 84 

Publisher Type 0.8283 0.1642 0.0232 

Total Citations 0.00001 0.0957 0.0012 

source items 0.0012* 0.6853* 0.0720* 

Libraries Holding -0,0009 -0.1990 0.0220 



* Significant at the 0.05 level 

demic community. 

When the third phase of the analysis of 
the library market for chemistry journals 
was turned to the set of foreign serials, the 
variables for the most part resumed the 
same basic pattern as in the first two 
phases with one notable exception. Con- 
cerning the basic pattern, in neither the 
form of f aculty score nor of total citations 
was scientific value a significant determi- 
nant of price. Both models showed size in 
number of source items as playing the 
major role in causing some journals to cost 
more than the others. The variable librar- 
ies holding had contradictory outcomes, 



having a significant effect on price in 
model 1 but not being significant in model 
2. However, libraries holding did not miss 
being significant by much (p = 0.06) in 
model 2. The one notable exception con- 
cerned the variable publisher type, which 
was not significant for the overall level of 
price in either model. Given the previous 
findings, this result was so startling that it 
prompted a reexamination of the basic 
data. An answer was readily found. 

Of the 67 U.S. journals in the chemis- 
try-serials database, 29 were published by 
associations, and of these 29 association 
journals, 20 were put out by the American 
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Figure 2. Publisher Structure of Library Market for Chemistry Journals (Association Publishers Designated 
by 0s; Commercial Publishers Designated by Is) 



Chemical Society. In contrast, of the 87 
foreign journals in the database, only 5 
were published by associations — one by 
the National Research Council in Canada 
(did ■! by the Chemical Society in Britain 
All the remaining serials in both groups 
were commercial journals except for one 
foreign journal published by a university 
press. Moreover, the British Chemical So- 
ciety charges like a commercial publisher 
for its major publication. When all five 
sections were combined, the Journal of 
the Chemical Society ranked third in price 
of the 154 serials in the database, and it 
cost $1.94 per ISI Source item in 1993. In 
comparison, the Journal of the American 
Chemical Society cost only $0.46 per 
Source Item in that same year. Outside of 
Britain, Canada, and the United States, 
the standard pattern for journals issued 
under association auspices was to be han- 
dled by a commercial publisher. Given the 
above considerations and that U.S. com- 
mercial publishers charge like foreign 
ones, there is a major dichotomy in the 
library market for chemistry journals be- 



tween largely U.S. association journals, on 
the one hand, and all commercial journals, 
on the other. 

Simply conceived, a market occurs 
when values are exchanged among enti- 
ties. In the library market for chemistry 
journals, libraries exchange money pre- 
sumably for scientific value. To obtain a 
picture of this market, both measures of 
scientific value — faculty score and total 
citations — were plotted against price, 
producing similar patterns. These plots 
are shown in figures 2 and 3. 

For analysis of the publisher structure 
of the market, in f igure 2 association jour- 
nals are designated by "0" and commercial 
journals by "1." For examination of the 
national structure of the market, in figure 
3 U.S. journals are designated by "0" and 
foreign journals by "1 " The plots reveal 
graphically three major characteristics of 
the library market for chemistry journals. 
First, this market bifurcates into two 
groups. On the one hand, running parallel 
to the faculty score/total citations axes, 
there is what might be called the "high- 
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Figure 3. National Structure of Library Market for Chemistry Journals (U S Journals Designated by 0s; 
Foreign Journals Designated by Is) 



value" group, where scientific value is 
more than warranted by prices, and the 
bulk of the scientific value is concentrated 
due to the skewed distribution of the vari- 
ables. On the other hand, running roughly 
parallel to the price axis, there is what 
might be termed the "high-cost" group, in 
which prices are higher than justified by 
scientific value and costs are concen- 
trated. Second, U.S. association journals 
compose the vast majority of the serials in 
the high-value group, determining its na- 
ture, whereas foreign commercial jour- 
nals make up the lion's share of the high- 
cost group. However, given the consistent 
lack of statistical significance for the 
dummy variable country of origin, the 
U.S. journals are in the high-value group 
not because they are U.S. journals but 
because they are association ones, and the 
foreign journals are in the high-cost group 
not because they are foreign but because 
they are mainly commercial journals. 
Third, from the perspective of the library 
market for chemistry journals as a whole, 
while neither U.S. associations nor com- 



mercial publishers take scientific value 
into account when pricing their journals, 
the former charge too little for scientific 
value, and the latter, too much. 

Practical Implications of the 
Structure of the Lihrary Market 
for Chemistry Journals 

The distributions of the variables measur- 
ing price and scientific value in the chem- 
istry-serials database appear to belong to 
the same mathematical family — the nega- 
tive binomial — and, therefore, have the 
same highly skewed pattern. However, 
here the resemblance ends, and the diver- 
gence begins. Fundamental to this diver- 
gence are the dissimilar causes underlying 
the skewed nature of the scientific value 
and price distributions. Whereas the high 
rating of some journals by peer opinion 
and the concentration of citations on 
these journals can be interpreted as re- 
sulting from a cumulative advantage proc- 
ess or a success-breeds-success mecha- 
nism based upon the social stratification 
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of science (Bensman 1982; Bensman 
1985), the origins of the skewed distribu- 
tion of prices lie elsewhere. In the latter 
case the major roles are played by the 
different pricing policies of the commer- 
cial versus the U.S. association publishers 
and the size of the journals. There must 
be added to this mixture, in my opinion, 
an element of larceny on the part of one 
foreign commercial publisher, whose 
egregiously priced chemistry journal 
f ormed the apex of the price distribution 
and was a consistent outlier due to its cost 
and value being so flagrantly out of line. 
Any chance for the prices of the chemistry 
journals in the database for this paper to 
derive from the same cumulative advan- 
tage process as their scientific value was 
negated by the negligible role scientific 
value played in their pricing. 

The major practical conclusion from 
the divergence of the price and scientific 
value distributions is that libraries are not 
caught in the locked system described at 
the beginning of this paper but are in a 
position to implement a massive restruc- 
turing of their serials collections. If jour- 
nals in other subject areas bif urcate in the 
same way as in chemistry, and if one oper- 
ates on the assumption that in times of 
budgetary stringency libraries should sub- 
scribe to the best journals and provide 
only remote access through document de- 
livery to the others, then the opportunity 
exists for libraries to downsize their serials 
collections in a signif icant way. To test this 
possibility, a software package called the 
Serials Evaluator has been developed at 
Louisiana State University. Based upon a 
Statistical Analysis Software (SAS) plat- 
form, the Serials Evaluator incorporates 
the statistical principles presented in this 
paper and analyzes sets of journals within 
subject classes by comparing their prices 
to their utility measures. These utility 
measures are of the three f ollowing types: 
(1) numerical indexes derived from sur- 
veys of the local faculty and experts; (2) 
ISI citation data; and (3) usage data gath- 
ered from library automation systems. A 
model of the Serials Evaluator utilizing 
the manual input of data has been com- 
pleted, and it is intended to interf ace the 
Serials Evaluator to the LSU NOTIS sys- 



tem for the automatic retrieval of subject, 
price, and usage inf ormation. 

For experimental purposes, the Serials 
Evaluator was utilized to explore the pos- 
sibilities of downsizing the set of 154 
chemistry journals in the database for this 
paper. In the experiment both faculty 
score and total citations were employed as 
utility measures. The Serials Evaluator of- 
fers a choice of two basic algorithms. En- 
visioned for use by small departmental or 
special libraries, one algorithm exploits 
the full divergence ofprice from scientific 
value by proposing for cancellation all 
journals whose percentage of total cost 
exceeds its percentage of total utility. 

With faculty score as the utility mea- 
sure, the Serials Evaluator designated for 
elimination 83 titles for a total cost reduc- 
tion of 79% and a total utility loss of 
34.8%, whereas, with total citations as the 
utility measure, the Evaluator selected for 
cancellation 105 titles for a total cost re- 
duction of 77.1% and a total utility loss of 
27.4%. Due to the high correlation be- 
tween faculty score and total citations, 
there was a considerable amount of over- 
lap , and 69 titles were common to both the 
faculty score and total citations cancella- 
tion lists. 

The other basic algorithm offered by 
the Serials Evaluator is to allow the user 
to set goals in terms of cost reduction and 
utility retention. In this algorithm the Se- 
rials Evaluator forms two different sets — 
one from the journals with the highest 
prices, another from the journals with the 
highest utility — and then compares these 
sets to select for cancellation only those 
high-price journals that are not in the 
high-utility set. In the experiment with 
the chemistry-serials database, the default 
value 75% was utilized for both cost re- 
duction and utility retention. Using fac- 
ulty score as the utility measure, the Seri- 
als Evaluator listed 30 titles whose 
elimination would result in a 34. 1 % reduc- 
tion in total cost with a mere 9.2% in total 
utility. With total citations as the utility 
measure, the Serials Evaluator named 37 
journals whose cancellation would reduce 
total costs by 40.8% with only a 10.3% loss 
in total utility. There was again a consider- 
able amount of overlap, and 26 titles were 



LRTS • 40(2) • The Structure of the Library Market /169 



on both the faculty score and total cita- 
tions cancellation lists. Of great interest 
was the finding that when the titles on sub- 
scription at the LSU Chemistry Library in 
1993 but not named by the faculty in the 
survey were weeded for those either not 
covered by the SCI or classed by ISI in 
nonchemistry subject groups as well as in 
only Biochemistry and Molecular Biology 
or Chemical Engineering, 49 journals 
costing $32,406.43 at 1993 prices re- 
mained to be considered for cancellation. 
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Notes on Operations 



Technical Services Workstations: 
A Review of the State of the Art 



Michael Kaplan 



The technical services workstation is an evolving technology. In its fully 
evolved state, it consists of a higher-end personal computer that is networked 
to local online systems, bibliographic utilities, and the Internet. It has 
available to it all the typical administrative tools associated with local area 
network (LAN) technology. A suite of technical services resources has been 
or is being developed to complement it, including the Library of Congress 
Cataloger's Desktop, LC Classification, and Dewey Decimal Classification. 
A number of institutions are placing local or standard, national-level docu- 
mentation on the Internet in hypertext markup language (HTML) form. 
Enhancements such as macro-driven processing are becoming common. 
With the advent of fully functional Windows terminal emulation programs 
for bibliographic networks, the promise of multiple online sessions is becom- 
ing a reality. Other Windows terminal emulators and interfaces and Win- 
dows clients will be appearing soon, and Z39. 50 clients are starting to appear 
bundled in with these packages. 



j^Jot all developments in technical ser- 
vices can be pinpointed as definitively as 
the advent of the format for Machine- 
Readable Cataloging (MARC) or the ap- 
pearance of our national bibliographic 
utilities. Much as the bibliographic utili- 
ties, particularly the OCLC Online Com- 
puter Library Center, Inc. (OCLC) and 
the Research Libraries Information Net- 
work (RLIN), began as regional or as spe- 
cialized databases that expanded upon 
their geographic or special membership 
affiliations, so too the technical services 
workstation (TSW) can trace its origins 
directly to the MARC format and the utili- 
ties that made MARC records available to 
libraries. By that I mean that we can trace 



the TSW's roots back ultimately to the 
primitive terminals that we used to con- 
nect to the utilities over leased telephone 
lines in the early 1970s. As those early 
terminals began to give way to the first 
generation of personal computers (PCs) 
in the early 1980s, when OCLC and RLIN 
released the first versions of their terminal 
software for PCs, it became inevitable that 
the single-session, primitive PC would 
evolve into a more advanced and multi- 
functional platform. 

Despite the fact that for many of us the 
OCLC M300 was our first glimpse of a 
PC, the evolution of the TSW was tortur- 
ous for more than a decade, until a few 
pioneering institutions began to nudge 
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the state of the art in technical services 
forward. Those institutions — Columbia 
University, Cornell University, the Li- 
brary of Congress (LC) and the University 
of California-San Diego were among the 
most prominent — had a vision, or a variety 
of visions, all converging on the goal of 
expanding and enhancing not only the 
catalog with new resources, but also en- 
dowing their technical services staff with 
new technologies. In retrospect it is clear 
that the MARC format, online systems, 
and telecommunications networks — both 
LAN- and Internet-based — would con- 
verge around an advanced PC, The ulti- 
mate goal will be to concentrate as much 
easily m ampul able technical processing 
power on the desktop as possible. The 
current movement from mairdiame- 
based processing to client-server architec- 
ture is conceptually not that different. 
Both have as their goals to empower the 
desktop computer and enable it to share 
the processing load with the central com- 
puter (server). Indeed, the very move- 
ment toward client-server architecture ft 
enabling further development of TSWs 
because institutions realize that the finan- 
cial investment in the TSW desktop and 
telecommunications infrastructure is an 
investment — a down payment — on that 
future, fundamental migration to a new 
architectural environment. 

ALA Character Set 

Perhaps the most crucial need that has 
always separated library computing and 
telecommunications — particularly in 
large, academic research libraries — from 
the rest of the computing world has been 
the need to support the American Library 
Association (ALA) character set. The ba- 
sic ASCII character set has never sufficed 
for libraries, nor will it ever. Ultimately we 
will have Unicode, but that is a new, still- 
ongoing, development. The need to be 
able to represent at least the characters 
used in languages that use the Roman 
alphabet and in the transliteration of non- 
Roman alphabets has' driven many of the 
decisions library technical services and li- 
brary systems offices have made over trie 
last twenty-five years. 



Despite all this, one of the significant 
compromises that many libraries have had 
to make as they have moved into the net- 
worked world has been precisely to accept 
less than full support for the ALA charac- 
ter set. Not every local system has enjoyed 
a communications package built expressly 
for it that is simultaneously conversant 
with the ALA character set. One of the 
most surprising revelations in the TSW 
survey that the Cooperative Cataloging 
Council's Automation Task Group (now 
the Program for Cooperative Cataloging 
[PCC] Standing Committee on Automat- 
ion) conducted in the summer and fall of 
1994 was the wide variety of communica- 
tions packages in use among libraries and 
that not all of them (only 74%) fully sup- 
ported the ALA character set (Kiege) 
1994b, questions 13 and 14). 

Nonetheless, while libraries have had 
to settle for this compromise, they have 
still indicated that their clear preference 
remains to provide support for the full 
character set if and when possible (Kiegel 
1994b, question 14). The PCC Standing 
Committee on Automation is now resur- 
veying this population for an Association 
of Research Libraries (ARL) SPEC Kit, 
expected to appear in early 1996. Prelimi- 
nary indications are that this situation has 
not yet changed dramatically, though the 
advent of a new generation of workstation 
products seems to indicate that it will be 
changing within the next one or two years 
as a new generation of Windows terminal 
emulation programs and clients is re- 
leased. 

When OCLC developed the M300, 
support for the ALA character set was a 
hardware-based solution: OCLC devel- 
oped a specialized chip to support the 
character set. As the power of PCs and 
their graphics capabilities advanced, how- 
ever, programmers developed solutions 
that combined off-the-shelf hardware (an 
EGA graphics card) with specialized soft- 
ware. OCLC made this decisive step for- 
ward with Terminal Software version 5.1. 
(This was before the shift to the Passport 
series of software releases.) The ex- 
ecutables in this program were dated Au- 
gust 4, 1987. Terminal Software 5.1 made 
use of loadable fonts using the EGA video 



LRTS • 40(2) • Notes on Operations /173 



mode. At that time the VGA mode had 
already been introduced by IBM, but it 
was too expensive a hardware option, so 
OCLC did not implement it on that level. 
Instead the EGA loadable fonts were also 
supported on VGA so that the configura- 
tion screen treated them alike as a single 
option. OCLC first implemented EGA on 
Wyse monochrome EGA monitors. 
Monochrome mode still switched on 
older Tseng and OCLC character ROMs 
on the Tseng graphics cards and modified 
IBM monochrome cards because OCLC 
needed backward compatibility with older 
monochrome hardware (Truthan 1995). 

Moving support of the ALA character 
set from specialized hardware to a combi- 
nation of standard hardware and custom- 
ized software went hand in hand with 
OCLC's resolve to free itself from being 
the required supplier of hardware and al- 
lowed it to focus its agenda and its ener- 
gies on software development. In retro- 
spect it was a major step in decoupling use 
of the OCLC online system from OCLC 
hardware and in enabling libraries to pur- 
sue a vision of standardization with the 
rest of the computing world. 

Similarly, as libraries have moved the 
technical services modules of their OPACs 
from terminals, which often supported the 
ALA character set by means of installed 
hardware or emulation cartridges — such 
was the case with mid- 1980 Telex termi- 
nals and late-1980-early-1990 IBM 3163 
terminals — to PCs, full support for the 
character set has lagged behind. Only a 
limited number of solutions for supporting 
the character set have been developed for 
use within the DOS environment. The 
ones most familiar to NOTIS libraries, for 
instance, are Yale University's YTerm and 
Cornell University's TN3270 software. 
They offered libraries a glimpse of a net- 
worked future that fulfilled their needs at 
a time when a Windowed, multisession, 
and multitasking environment was still just 
an over- the -horizon dream. 

Of the two emulation packages 
designed for NOTIS, YTerm is the older 
and the one more closely identified with 
prenetworked telecommunications. While 
YTerm can run over a local network, its 
primary means of conducting telecommu- 



nications sessions is via a PC's COM ports. 
And while both YTerm and TN3270 can 
run under Windows in so-called DOS 
boxes, Cornell's software was designed 
around networking and the Internet. In 
that respect it was the first serious pack- 
age that enabled those libraries whose 
catalog infrastructure (i.e., IBM main- 
frame based) it fit to move into a wired, 
networked world. 

The Cornell software, however, had 
several major drawbacks. It was not a true 
Windows application, so it could not run in 
a true multisession environment. The clos- 
est catalogers could come was to run mul- 
tiple applications and switch (via Alt-Tab) 
between them, but regardless the sessions 
could not be simultaneously displayed in a 
series of open windows. It supported only 
a very limited number of network cards. 
And, perhaps most significant, it soon be- 
came an orphan product: further develop- 
ment work on it ceased just as advances in 
PC and telecommunications technology, 
coupled with significant downward trends 
in their pricing, brought the vision of mul- 
tiple sessions within the reach of the library 
community. 

For NOTIS libraries the current succes- 
sor to Cornell TN3270 is a software package 
developed by Pierre Goyette and his col- 
leagues at McGill University (information 
on the McGill packages can be found on the 
World Wide Web (WWW) at http:// 
masicm.m(^.ca/~roy^ttpTCP3270.html). 
Available for Windows as TCP3270, this 
software represents a giant stride in the 
consolidation of technical services appli- 
cations in a single environment, namely 
Windows (it is also available for DOS, 
NET3270). TCP3270 is a true Windows 
application, with pull-down menus, online 
help, mouse control, sizable fonts, etc. It 
has four attributes that cause it to stand 
out at this early stage of technical services' 
migration to Windows: 

• It can run multiple sessions (up to five 
at a time). 

• It is designed specifically for use on 
local networks and the Internet. 

• It is Winsock compatible. 

• It uses a Windows True-Type font to 
enable display of the full ALA charac- 
ter set. 
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It can also print the full ALA character 
set on any Windows-supported graphics 
printer. This is a demonstrable advantage 
over the old Cornell software, which was 
unable to print the full character set. 

WlNSOCKS AND WINDOWS 

"Winsock compatibility" is a term that is 
already frequent in library telecommuni- 
cations literature, and it will become more 
so. A Winsock is an application program- 
mer's interface (API) that essentially al- 
lows Windows programs to work over the 
Internet — in simple terms, it allows a 
Windows program to "talk" over the In- 
ternet. It is, in fact, Winsock compatibility 
that is one of the crucial factors in ena- 
bling multiple Windows sessions. The 
new RLIN Windows software, OCLC 
Passport for Windows, WWW browsers 
such as Mosaic and Netscape, SeaChange's 
BookWhere? Z39.50 package — all are 
Winsock compatible, and that means that 
all can coexist as simultaneous sessions. 

The shift from DOS-based programs 
to a set of programs and utilities all built 
around Microsoft Windows has enormous 
inherent advantages for the profession be- 
sides the character set and Winsock com- 
patibility issues. The most significant of 
these is a common, consistent interface 
and the ability to copy and paste between 
applications. A consistent interface will 
lead to simplification in training routines 
because, as staff become accustomed to 
dealing with various Windows applica- 
tions, the learning curve becomes much 
flatter. For copy and paste between vari- 
ous library bibliographic systems to be 
fully functional, however, will eventually 
require some further work to enable 
proper treatment of non-ASCII charac- 
ters as they move through the Clipboard 
from one application to another. 

During the next one to two years we 
can expect to see an entirely new genera- 
tion of products begin to emerge from 
local systems vendors as they begin to 
develop their own true workstations. First 
to emerge will be true Windows emulators 
with advanced functionality. Close on the 
heels of those products will be true Win- 
dows clients incorporating Z39.50 as an 



integral part of their functionality. The 
workstation package that is most ad- 
vanced at this stage is probably that from 
the Library Corporation (Bibliofile), the 
Integrated Technical Services Worksta- 
tion (ITS for Windows), which already 
incorporates context-sensitive help, on- 
line documentation, and authority check- 
ing and will soon have Z39.50 and an ad- 
vanced macro language. GEAC is also 
hard at work on its package, as indeed are 
most of the other major vendors, though 
at the fall 1995 NOTIS Users' Group 
Meeting it was apparent that the 
Ameritech Library Services Academic Li- 
brary Division is still in the early stages of 
designing a more narrowly defined Cata- 
loger's Workstation (Meyer 1995; Weiss- 
man 1995). 

Library of Congress 
Workstations 

A word about LCs Bibliographic Worksta- 
tion (BWS) is in order. LC has long been 
saddled with what it has recognized as an 
archaic mainframe system that has griev- 
ously limited its ability to move forward on 
a broad range of technical fronts. Partly to 
counter that problem, but also to replace 
their old ComTerm terminals, LC began 
to develop a technical services worksta- 
tion designed to work within the con- 
straints of their network and system. The 
result was the BWS, an IBM PC running 
under OS/2 with software developed spe- 
cially for LC. One of the most significant 
aspects of that software was its ability to 
support the ALA character set. LCs deci- 
sion to focus its time, energies, and finan- 
cial resources on OS/2, rather than, say, 
Microsoft Windows, was and is an ex- 
tremely interesting decision. The auto- 
mation experts at LC are strong advocates 
of OS/2, but the general marketplace has 
resoundingly chosen to focus on Win- 
dows. LCs decision might well have been 
the correct one, at least for LC, but it does 
not appear that it will be a lead that will 
be followed widely any time soon. 

In the last four years the Cataloging 
Directorate of LC has begun to install 
BWSs on a massive scale. Management at 
LC, as elsewhere, looks to advanced tech- 



LRTS • 40(2) • Notes on Operations /175 



nology as one means to increase produc- 
tivity in an era of retrenchment and down- 
sizing. During 1994 and 1995 members of 
the Cataloging Directorate (Robert Au- 
gust, Richard Thaxter, and David William- 
son, and also Larry Dixson in Network 
Development, among others) developed 
an impressive number of utilities (Elec- 
tronic CIP, Text Capture and Electronic 
Conversion [TCEC]). While it is too early 
to make firm predictions, they have begun 
to investigate a client-based Z39.50 appli- 
cation, BookWhere.?, as a potential re- 
placement to their host-based Z39.50 and 
as a means of extending and enhancing 
their catalogers' electronic reach across 
cyberspace (Williamson 1995a; Thaxter 
1995). By means of these programs cata- 
logers at LC might eventually conduct 
remote searches, retrieve records, and 
mark them up with ISBD punctuation for 
conversion to MARC records. Staff at LC 
are also investigating the possibility of us- 
ing these programs to feed MARC records 
to the LC system, MUMS. Furthermore, 
by doing much of this searching of remote 
databases via Z39.50, they have the advan- 
tage of access to many disparate systems 
and catalogs without having to ieam the 
command structures and syntax associ- 
ated with each of them. 

Recently members of the Cataloging 
Directorate have begun to experiment 
with a new program, ClipSearch, which 
was developed as a utility for use with the 
BWS. It provides the BWS with the capa- 
bility of incersession searching, automatic 
copying of headings from authority re- 
cords to bibliographic records, as well as 
tmline MARC code lists. They plan to 
enhance it soon to provide for automatic 
generation of name authority records as 
well (Williamson 1995b). 

It is a curious fact that as the power of 
desktop computers grew over time and as 
their prices dropped, library cataloging 
backlogs grew to crisis proportions. Read- 
ers of library literature are all familiarwith 
the depressing frequency wifb which 
phrases such as "crisis" and "backlogs" 
have appeared in library technical services 
literature. The causes for this — the explo- 
sion in publishing and all the other reasons 
proffered— are well known. Solutions, 



such as cataloging simplification and reli- 
ance on various levels of minimal-level 
cataloging, are also well known. Develop- 
ment of technical services workstations 
can also be viewed as a solution, albeit a 
much more positive and proactive solu- 
tion, but possibly also a defensive one, 
than these other solutions that technical 
services staff have tended to accept be- 
grudgingly. We live in an age of restruc- 
turing and reengineering and the applica- 
tion of business techniques to library 
services, and there is no doubt that TSWs 
enable library staff to work smarter and 
accomplish "more, faster, cheaper, bet- 
ter." But it has been only in the last few 
years that libraries have begun to invest in 
powerful workstations in large numbers. 

There exist, in fact, few more positive 
statements about catalogers' willingness 
to adapt and innovate than this technology 
and the burgeoning number of appli ca- 
tions and utilities, many of which repre- 
sent grass-roots efforts, being developed 
for use with the technical services work- 
station. A recent article on "reinventing 
catalogers," which describes catalog de- 
partments as not having changed much in 
the last twenty years, entirely misses the 
point of how catalogers are becoming in- 
creasingly entrepreneurial and innovative 
(Waite 1995). Attendance at the series of 
Association for Library Collections & 
Technical Services (ALCTS) institutes, 
"Technical Services Workstations: The 
State of the Art of Cataloging," as well as 
at a number of talks I have given on work- 
station technology dur ing tile past year, is 
vivid witness to catalogers' palpable hun- 
ger to adopt this technology. Doing so, 
however, demands a shared partnership 
between technical services departments 
and administrators to demonstrate the 
need and justify the potential dividends 
from these devices. Those institutions that 
have already done so are already reaping 
the rewards of their foresight (Kaplan 
1995a; Kaplan 1995b, 10). 

Recently Roger Brisson of Pennsylva- 
nia State University and Janet McCue of 
Cornel] University discussed the transfor- 
mation and retooling currently under way 
in technical services departments that are 
adopting workstation technology. They 
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sketch a number of cataloging scenarios in 
which they contrast the old, manual 
method of cataloging with the new means 
by which catalogers can assemble a record 
by gathering information electronically 
(Brisson 1995; Brisson and McCue 1996). 
The "vivid descriptions" that the Coopera- 
tive Cataloging Council Task Group I in- 
cluded in its final report, by means of 
which they challenged us to imagine a 
totally online, networked future, are in- 
deed starting to become our new reality 
(CCC Task Group I 1994). 

Impact on Productivity 

One of the most significant revelations of 
the previously mentioned TSW survey 
was its impact on productivity (Kaplan 
1995a). Included among those results 
were the following: 

• Cornell University's Mann Library: 
acquisitions time cut in half 

• Harvard College Library's Cataloging 
Services Department: production up 
63% despite an 18% reduction in hours 

• Library of Congress: productivity for 
certain phases up by as much as 25% 

• New York Public Library: significant in- 
crease in throughput with fewer staff 

• Pennsylvania State University: 200% 
to 300% increase in productivity for 
original cataloging 

• UCLA: increased total output with de- 
creased number of staff 

• UNLV: 10% less time to catalog LC 
copy; 25% less time for member copy 

• University of North Texas: disappear- 
ance of backlogs (including long-term 
backlogs) 

Electronic Documentation 

LC, through the Cataloging Distribution 
Service (CDS), is developing the Cata- 
logers Desktop and LC Classification 
Plus precisely as a means of enabling its 
catalogers to increase productivity. The 
intent is to convert a large number of 
unwieldy paper documents to an im- 
proved, networkable, windowed, elec- 
tronic format, with all the advantages of 
electronic indexing and easy updating. 
These documents once consumed en- 



tire shelves of reference space at a cata- 
logers desk. Most institutions, of which I 
expect Harvard College Library was typi- 
cal, could not afford numerous sets of all 
these diverse documents, particularly the 
LC classification schedules, with all of 
their attendant updates, commercial re- 
compilations, etc. A few complete sets 
were available at widely scattered loca- 
tions , and they were not quickly accessible 
to all catalogers. Moreover, keeping these 
schedules current was a time-consuming 
task, and consulting the basic schedule 
plus all of its updates was not a particularly 
efficient (nor pleasant) way to work. With 
Classification Plus it will be possible to 
check the schedules, already recompiled, 
from any LAN workstation. 

CDS has made rapid progress in con- 
verting the most consulted paper docu- 
ments to electronic format. The list, which 
is still growing, currently includes: 

• Library of Congress Rule Interpretations 

• Music Cataloging Decisions 

• Subject Cataloging Manual: Classifi- 
cation 

• Subject Cataloging Manual: Subject 
Headings 

• USMARC Concise Formats: Holdings 
Data 

• USMARC Concise Formats: Classifi- 
cation Data 

• USMARC Concise Formats: Commu- 
nity Information Data 

• USMARC Format for Authority Data 

• USMARC Format for Bibliographic 
Data 

• USMARC Code Lists for Countries, 
Geographic Areas, Languages and Re- 
lators, Sources, and Descriptive Con- 
ventions 

As for Classification Plus, CDS plans 
to release about seven schedules in early 
1996 (including E-F, H, L, R, T, and pos- 
sibly Z), with the remaining schedules to 
be released on a staggered basis as they 
pass quality assurance in LC's Cataloging 
Policy and Support Office (CPSO). Clas- 
sification Plus will also include the LC 
subject headings (not in tagged format) 
with many links between the subject 
headings and the classification schedules. 
True hypertext links will be made between 
call numbers embedded in the subject 
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headings and their appearance in the 
schedules, but it will be possible to do 
keyword searches, for instance, between 
captions and other occurrences in the 
schedules and the subject headings. In- 
deed, CPSO plans over time to use this 
product as a means to regularize and en- 
rich the vocabulary shared by the sched- 
ules and the subject headings, as well as to 
add more classification numbers to Li- 
brary of Congress Subject Headings itself. 

The search engine that CDS selected 
for both Catalogers Desktop and Classifi- 
cation Plus is from Folio Corporation. 
Among the various advantages that this 
selection offers is that it is a Windows- 
product, allowing catalogers to keep their 
various cataloging sessions open while 
consulting relevant documentation. It also 
lias easy networking capabilities, support 
of hypertext links, and the ability to sup- 
port local "shadow" files consisting of'local 
text highlighting, bookmarks, hypertext 
links, or notes. All of these options can be 
implemented on an individual or depart- 
mental basis to support local options and 
local decision making, and all are recon- 
cilable and can be carried ibrward from 
issue to issue of the Desktop. 

Use of a common interface — Folio 
Views — for the Desktop and Classifica- 
tion Plus will expedite the training process 
because staff will only have to learn to 
navigate and use a single program. There 
is a reasonable expectation that CDS will 
be able to secure price savings as well for 
institutions ordering both packages be- 
cause they will not need redundant soft- 
ware licenses from Folio Corporation. 

As their plans for a machine-readable 
version of Anglo-American Cataloguing 
Rules, second edition, 1988 revision 
(AACB2), go forward, ALA Editions is 
also considering use of Folio Views as the 
software engine for what is envisioned to 
be a CD-ROM product. While there are. 
many issues regarding the copyright hold- 
ers yet to be resolved, and many other 
matters still be to finalized, ALA Editions 
is hoping to have a product ready to demo 
in time for the ALA Annual Conference 
in 1996. If all goes well, it will be demon- 
strated there at a program sponsored by 
the Library Information Technology As- 



sociation (LITA)/ALCTS Microcomput- 
ers for Technical Services Interest Group 
(by June this group should be renamed 
the LITA/ALCTS Technical Services 
Workstation Interest Group). 

At the same time as CDS is preparing 
Classification Plus, Gale Research is 
working on a competing product, a net- 
worked version of SUPERLCCS. While it 
was demonstrated along with Classifica- 
tion Plus at the ALA Annual Conference 
in Chicago in June 1995, it is too soon to 
tell how the marketplace and catalogers in 
the field will accept either product. It is 
already clear, however, that Gale Re- 
search has the more difficult job because 
its editorial staff cannot rely on the policy 
experts in LC's CPSO for help in design- 
ing their product, while the availability of 
this support has been a major factor in the 
development of Classification Plus at LC. 

The subject of online classification 
should not pass without mentioning 
OCLC Forest Press' development of 
Dewey for Windows, the Windows ver- 
sion of its Electronic Dewey. The DOS 
version has long been out, while the Win- 
dows version remained a research project 
until mid-1995. OCLC Forest Press has 
now committed itself to bring out the 
Windows version, and it is expected to 
appear in mid-1996 (OCLC Forest Press 

criticism of the earlier version, namely 
that it was not a networked application. 

Cataloger's Desktop is rapidly becoming 
the de facto standard in documentation. By 
the fall uf 1995 over 180 institutions had 
entered subscriptions to it, far surpassing 
CDS' expectations. Some institutions (e.g., 
Pennsylvania State University and Johns 
Hopkins University) have decided to mesh 
their local processing manuals and decisions 
closely with the Desktop by creating local 
"infobases" for their manuals. They can then 
hypertextually link their own decisions to 
those of the Library of Congress. The na- 
tional trend, however, is not to make use of 
Folio Views in this fashion, but to create 
local Web sites on which to mount local 
documentation. The University of Virginia 
was among the early leaders in this trend, 
but now there are a number of prominent 
sites with more and more documentation 
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that is generically useful and easily avail- 
able over the Internet. A few examples are 
the Name Authority Cooperative (NACO) 
Manual (http:// infbsharel.princeton.edu/ 
katmandu/cp20/ cp20_toc.html), an on- 
line glossary of European languages 
(http://buddy.Iibrary.mun.ca/~charlHP9/ 
biblang.html), and Tools for Serial Cata- 
logers ( http ; //www. library.vanderbilt.eilu/ 
ercelawnAerials.html). One at the most 
extensive is TPOT. Technical Processing 
Online Tools at the University of Califor- 
nia, San Diego (hppt://tpot.ucsd.edu), 
which also includes Barbara Stewart's list 
"Top 200 Technical Services Benefits of 
Home Page Development." Many of 
these Web sites point to one another, and, 
as with the Web in general, more tools and 
more sites are appearing all the time. Ex- 
ploiting the growth of electronic cataloging 
tools on the Web is, in fact, one of the most 
fertile and innovative areas for applying use 
of technical services workstations. 

Local Area Networks 

Local area networks, once rarely found in 
libraries, are becoming commodity items. 
It is actually their development and the 
concomitant development of technical 
service tools designed specifically to run 
on networks that are enabling technical 
services staff to redefine their workplace 
as a collaboratory one. In the series of 
ALCTS institutes on the "Technical Ser- 
vices Workstations : The State of the Art of 
Cataloging," which was previously men- 
tioned, Janet McCue of Cornell Univer- 
sity goes so far as to suggest that being 
connected to a network and the Internet 
is an essential attribute of a technical ser- 
vices workstation. The explosive growth of 
the Internet has only made this criterion 
more pronounced. 

From the administrators perspective, 
one obvious advantage in placing com- 
puter packages on local area network ser- 
vers is that it eliminates the need to buy 
and install a copy of every program on 
every staff member's machine. A site li- 
cense for ten simultaneous users might 
well be capable of serving a population of 
one hundred staff. Programs have been 
developed to provide simultaneous use 



metering and enable publishers' and de- 
velopers' rights to be protected. The 
equivalence suggested for Cataloger's 
Desktop, for example, is one simultane- 
ous license for each ten potential users. 

Workstations and 
the National Utilities 

In the middle to late 1980s the national 
bibliographic utilities began to worry 
about their continued viability in the face 
of rapid growth and development of local 
systems. As local systems grew ever more 
powerful, with workstations capable of ad- 
vanced editing features, and particularly 
as large research libraries began to mount 
the LC MARC file locally as a resource file 
for use in acquisitions and cataloging, the 
utilities began to worry about the eff ect this 
would have on their financial well-being. 
This led in part to the utilities' creating 
incentives for contribution of original re- 
cords to their databases. At RLIN that re- 
ward has taken the form of free searches, 
whereas OCLC awards actual monetary 
credits that have increased markedly over 
time. OCLC currently rewards an institu- 
tion inputting a new bibliographic record 
online — not via tape or batch loading — a 
credit of $3.65. The OCLC policy, known as 
Contribution Pricing, has been evolving 
now for almost a decade and, in addition 
to rewarding institutions for contributing 
original cataloging, is theoretically calcu- 
lated to reach a point where there are no 
charge* for cataloging usage, but rather 
only for various kinds of access (search- 
ing, holdings, etc.). 

Tape loading of bibliographic records 
and holdings from local systems into the 
national utilities has grown exponentially 
over the past decade. Where tape loading 
was once the exclusive province of the 
national libraries, it has now become so 
common among large libraries that both 
OCLC and RLIN have had to make major 
arrangements to handle tape loading. For 
many years RLIN used to make use of 
long weekends for batch loading of re- 
cords, and it has now developed a new 
database configuration to enable it to deal 
with batch loading more efficiently. 
OCLC opened the door, albeit only 
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slightly, toward monetary credits for tape- 
loaded records when they began to credit 
each tape-loaded record twenty-five 
cents. (The original rationale for credits 
was to reward institutions for the addi- 
tional searching required to establish the 
uniqueness of a new bibliographic record. 
Over time the rationale at OCLC has 
shifted to encouraging and rewarding the 
contribution in a more direct fashion, but 
batch-loaded original records still re- 
ceived only a minimal financial reward on 
the theory that the machine at OCLC, and 
not a cataloger in the field, is doing the 
work to establish a record as original.) 
Then, too, record loading itself has begun 
to change dramatically as institution*; have 
begun to move away from exchanging re- 
cords on physical tapes and have migrated 
toward using FTP (file transfer protocol) 
as the preferred me.dium of exchange. It 
is worth noting that this was one of the 
major recommendations of the Coopera- 
tive Cataloging Council Task Group II on 
Availability and Distribution of Records 
(CCC Task Group II 1994). 

The move toward creating original 
catalog records in local systems, rather 
than directly online in the national utili- 
ties, has become a fundamental tenet of 
cataloging policy in many organizations, 
such as Harvard College Library. Sev- 
eral causes have contributed toward this 
trend, including the power of worksta- 
tions connected to local systems and the 
(local) preexistence of acquisitions- 
level records, which catalogers can up- 
grade more quickly and easily than by 
keying a record from scratch in the utili- 
ties. Figures from OCLC bear out the 
significance of this trend: while it is true 
that between July 1, 1994, and June 30, 
1995, OCLC saw more than 1.5 million 
original records created directly online, 
nonetheless in that same period OCLC 
batch loaded more than 314,000 mem- 
ber records that were received either on 
tape or via FTP (Greene 1995). This 
latter area, particularly where FTP is 
concerned, will clearly be an increas- 
ingly popular means of entering records 
into the OCLC database. It is worth 
noting that discussions are taking place 
with the goal of extending the batch- 



loading option in OCLC to CONS ER and 
PCCBIBCO records. 

The growth of local systems, again par- 
ticularly among large libraries, has led to 
both RLIN and OCLC developing techni- 
cal means whereby those institutions 
could integrate access to the utilities with 
their local infrastructure. At RLIN this 
was originally known as LANTerm, later 
EtherTerm, but RLIN has now moved 
away from this methodology in favor of 
almost total reliance on the Internet — 
mediated at an institution's discretion by 
use of a CompuServe Dedicated IP Net- 
work or dial connection, plus some mini- 
mal direct-dial lines. 

In the case of OCLC there have been 
and continue to be several options, includ- 
ing use of a communications controller 
attached to a local network, but the pre- 
ferred solution among large libraries is 
now to use the Telecommunications Link- 
ing Program (TLP) to connect to a net- 
work router and from there to OCLC. At 
Harvard University, for instance, this 
technology has enabled large financial 
savings (approximately twenty thousand 
dollars per year) on telecommunications 
by eliminating many stand-alone, isolated, 
dedicated circuits. At the same time, by 
pooling all of the remaining circuits and 
making them universally accessible to 
anyone on the campus network with ap- 
propriate software and authorization, the 
remaining twenty-six virtual circuits can 
be shared on a contention basis by several 
hundred staff members. The TLP also has 
the added advantage that it runs at fifty-six 
KBS (kilobytes/second), much faster than 
the standard OCLC telecommunications 
protocol. Finally, OCLC too has recently 
made Internet access for technical ser- 
vices (the Prism service) available to its 
members; users of their reference ser- 
vices had already enjoyed that option. 

At the Harvard College Library, the 
largest unit of the Harvard University li- 
braries, we have discovered that integrat- 
ing access to bibliographic utilities with 
the local processing environment has had 
highly positive results beyond the finan- 
cial savings. There has been a pronounced 
psychological impact on technical services 
staf f who have ready and easy access to the 
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utilities right at their desktop. It is no 
longer necessary to get up, walk halfway 
across the building to discover whether a 
dedicated OCLC computer is available, 
perhaps interrupt someone, do some 
searching or claiming, and then return — 
again hallway across the building — to one s 
own desk. (And then, no doubt, discover 
that further searching is required.) Rather, 
all this processing can be done, literally, by 
"letting your fingers do the walking." Cata- 
loged no longer tend to batch their 
searches for deferred sessions, but tend to 
process a book as an entire entity. 

We can actually measure the differ- 
ence by comparing transactions in the 
pre- and post-TLP eras. In the first six 
months of fiscal year 1995, searching (ex- 
cluding Title Scan searches) in OCLC by 
the Harvard College Library increased 
84% over the comparable period for fiscal 
year 1994 (103,891 to 191,113 searches). 
Even more significantly, prime-time pro- 
duction (claiming) of records was up 
134% (19,452 to 45,551) and non-prime 
time production was up 100% (8,308 to 
16,645)! The obvious interpretation is 
that, while we are not only doing more 
searching because access to OCLC is 
vastly easier and more ubiquitous, we are 
also making more efficient and more eco- 
nomical use of our searching activities 
than we had previously. 

This integration of local systems and 
remote bibliographic utilities, while a pro- 
gressive step, is just now becoming fully 
the tightly integrated system that we 
need. Because neither RLIN (until June 
1995) nor OCLC (until January 1996) had 
Windows packages, it was not possible to 
run simultaneous Windows sessions in 
conjunction with open local processing 
sessions. As that is changing, however, it 
becomes possible to envision different 
scenarios for processing. For instance, 
rather than batch loading or Generic 
Transfer and Overlay (GTO) transfer of 
records from utility to local system, why 
not use the ability afforded by multiple 
sessions to copy and paste from the utility 
to the local system? It will, of course, be 
necessary to develop the tools (hot keys, 
macros) to convert records from one for- 
mat to another, but that is possible, and it 



is a function that we should expect local 
systems vendors to offer us once they and 
the national bibliographic utilities have 
reached a common threshold in reliance 
on Windows. 

Program for Cooperative 
Cataloging 

Three years after the formation of the 
original Cooperative Cataloging Council 
(CCC) and almost two years after the re- 
lease of its strategic plan, it is worth noting 
the impact that the CCC has had on the 
development and adoption of TSW tech- 
nology. Throughout the reports of the in- 
itial six CCC Task Groups there was a 
common thread of applying new, ad- 
vanced technologies to the process of bib- 
liographic control. A number of the most 
significant challenges were gathered to- 
gether in the automation appendix to the 
CCC s strategic plan (CCC 1994). Among 
those, the most pertinent for TSW devel- 
opment were: 

• Encourage development of Z39.50. 

• Encourage use of TLP/EtherTerm. 

• Expedite development of online cata- 
loging tools. 

• Develop word-processing capabilities 
within a TSW platform. 

• Develop capability of windowing on 
TSWs. 

• Develop customizable macros and 
macro packages. 

Shortly after the CCC released its 
strategic plan, it created a Task Group 
on Automation whose primary charge 
was the issues listed in the automation 
appendix. During its first six months, 
this group managed two significant ac- 
complishments. The first was the In- 
ternet survey (previously mentioned) to 
gather information on development of 
TSWs. I was its primary designer, and 
Kiegel of the University of Washington 
was primarily responsible for the compi- 
lation and interpretation of the re- 
sponses (Kiegel 1994a; Kiegel 1994b). 
We found that most institutions were 
gravitating toward platforms consisting 
of a PC-compatible workstation, with an 
80486-class processor as the then-pre- 
ferred choice. Pentium-class machines 
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were beginning to appear in libraries al- 
ready in 1994. Windows was clearly the 
preferred multitasking environment. 
Current specifications for many institu- 
* tions now show that they are buying ma- 
chines as follows: 

• Pentium class, 60+ MHz 

• 16 megabytes RAM 

• 340+ megabytes hard drives 

• VGA or SVGA monitor, with fourteen 
inch still the norm but with seventeen 
inch the preferred size if sufficient 
funds could be found — the ability to 
run a monitor at higher resolutions on 
a larger screen can result in a two- 
thirds increase in applications that can 
be viewed comfortably and allow eas- 
ier comparison of multiple records on 
a single screen 

• Mouse 

• Ethernet card 

• DOS, Windows (current versions of 
both) 

• Communications software configured 
for the local system or online catalog. 
The Task Group's successor body, the 

Standing Committee on Automation of 
the Program for Cooperative Cataloging, 
as previously noted, is in the process of 
compiling an ARL SPEC Kit on technical 
services workstations. 

The second major initiative of the Task 
Group on Automation was to convene a 
meeting of library service vendors at LC 
on November 18, 1994. Representatives 
of some forty vendors attended. The ob- 
jectives of the meeting were to introduce 
the vendors to the programs and aims of 
the PCC and to demonstrate for them a 
number of grass-roots enhancements to 
cataloging productivity. Beyond that, the 
intention was to initiate a dialogue be- 
tween the PCC and these vendors in the 
course of which we could exchange ideas 
and plans for furthering the goal of in- 
creasing cataloger productivity. It is grati- 
fying to see that the vendors, while con- 
scious of protecting their corporate 
business plans, are nonetheless' beginning 
to reveal their corporate strategies for im- 
plementing some of our propositions. 
Some of these concepts are now begin- 
ning to come to market (CCC Automation 
Task Group 1994). 



Authority Records 

In terms of national programs there are 
even more pressing reasons to push for- 
ward with the opportunities afforded by 
the release of RLIN and OCLC under 
Windows. When the CCC's Task Group 
on Availability and Distribution issued its 
final report in 1993, it called for transfer- 
ring distribution of both bibliographic and 
authority records from a system based on 
tape loading to one based on use of FTP. 
That transition has begun and in many 
areas is already largely accomplished, par- 
ticularly where distribution of records 
from OCLC and RLIN to LC, and from 
LC to OCLC and RLIN, are concerned. 
By the time this note will have appeared, 
OCLC should have completed testing the 
capability to accept and load into its 
PRISM authorities save file NACO re- 
cords received via FTP. Contributors will 
still be required to verify the records' 
uniqueness and update them once they 
are in the save file, thereby actually caus- 
ing them to be indexed in OCLC's mir- 
rored version of the LC authority file and 
from there transmitted to LC for further 
distribution as part of the national author- 
ity file. 

But now, particularly with regard to 
time-sensitive authority records, condi- 
tions demand new solutions. A number of 
institutions have developed program- 
matic responses to the creation of author- 
ity records; chief among these has been 
the sophisticated Catalogs r's Toolkit, a 
Visual Basic program developed by Gary 
Strawn at Northwestern University. The 
manual associated with this program is 
availahle for inspection online (http:// 
www.library.nwu.edu/clarr/). This pro- 

nical services staff to create authority rec- 
ords basically by clicking a button on the 
screen. Pasting that locally created record 
over to an open national utility session 
now hecomes the next challenge. 

An alternative approach involving na- 
tional programs where records must be 
handled directly within the bibliographic 
utilities is offered by work being done by 
Robert Bremer of the Online Data Qual- 
ity Control Section at OCLC. CONSER 
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(and Enhance) participants must still en- 
hance (lock and replace) preexisting serial 
(and nonserial format) records online in 
OCLC, although, as noted above, work to 
enable batch loading of CONSER and 
BIBCO records is ongoing. Working on- 
line in OCLC, it will then be possible to 
use macros Bremer is creating with the 
new OCLC Macro Language for Passport 
for Windows to generate authority rec- 
ords for personal and corporate main or 
added entries. 

Steps such as these are truly exciting 
and enable us to transfer much of the 
manual work associated with the creation 
of authority records to automated proc- 
esses. The PCC Standing Committee on 
Automation has recently established a 
Task Group on Expediting Creation and 
Delivery of Bibliographic and Authority 
Records to investigate and make recom- 
mendations on a generic model for local 
systems vendors that will detail the re- 
quirements of an automated authority 
creation module. The task group is 
charged to: 

• Design a data model, preferably sys- 
tem neutral, for system-mediated (i.e, 
macro-driven or program matically 
driven) creation of authority records 

• Design a model for real-time transfer 
of records from a local system to a 
bibliographic utility session 

If there is any single area of technical 
workstation development — besides the 
release of RLIN and OCLC under Win- 
dows, and the prospect ol systems vendors 
releasing terminal emulations and Z39.50 
clients under Windows — that has excited 
the profession, it is the prospect ol macro- 
driven, programmatic solutions to many 
of our cataloging concerns. I have men- 
tioned Gary Strawns programs at North- 
western to create authority records. David 
Williamson at the Library of Congress and 
now Robert Bremer of OCLC have taken 
their cue from him and have developed 
similar programs, though OCLC's is one- 
dimensional by comparison. Harvard Col- 
lege Library's Cataloging Services De- 
partment applied a DOS shareware 
program, NewKey, to its cataloging opera- 
tions, and the startling increase in produc- 
tivity we experienced has already been 



noted. I have calculated that 1.5 million 
keystrokes per year have been eliminated 
(along with the attendant chance for er- 
ror) by automating much of the copy cata- 
loging process. Copy cataloging has for 
the most part become a hot-key process, 
with the operator reviewing the record 
and wanding the barcode. 

It is envisioned that the next genera- 
tion of Windows macro packages might 
enable the process to be even simpler and 
more error free. That might well come at 
the price of moving away from a simple 
scripting language, such as exists in 
NewKey, and adopting complex program- 
ming languages such as Visual Basic and 
its derivatives that are embedded in Pass- 
port for Windows and McGill TCP3270 
version 3.0. In addition, there are other 
stand-alone Windows macro packages — 
such as ProKey and SmartPad — that 
might work well in certain environments, 
but that do not offer the completeness of 
packages based on Visual Basic. 

One can postulate that advances in 
macro programming will free us to con- 
centrate more and more on the truly intel- 
lectual parts of the cataloging process by 
automating increasingly large portions of 
what is now rote keying and rekeying. 

Conclusions 

This is a time of great ferment in technical 
services departments. The threat of out- 
sourcing has added a new urgency to mod- 
ernizing the way we do business. Recog- 
nition of accountability and the bottom 
line is driving much of the progress in 
bringing the real promise of automation 
to technical services. This process is not 
just top-down or systems office-down; fre- 
quently the real developments in techni- 
cal services workstations are happening 
on the front lines with innovative and en- 
trepreneurial staff members in technical 
services departments. Long frustrated by 
the slow pace of mainframe-style auto- 
mation, technically minded individuals in 
institutions large and small are creating 
tools to run on their desktops and in so 
doing are creating the technical services 
workstation. There is indeed a revolution 
going on in technical services today. As we 
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continue to apply significant program- 
matic solutions to many of our more re- 
petitive, less intellectually demanding 
tasks, we can begin to divert our resources 
and energies to the true tasks that lie 
ahead: providing bibliographic and 
authority control to works both traditional 
and new. 
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Subject Indexing: Principles and Prac- 
tice* in the 90s, Proceedings of the 
IFLA Satellite Meeting Held in Lis- 
bon, Portugal, 17-18 August 1993, 
and Sponsored by the IFLA Section 
on Classification and Indexing and 
the Instituto da Biblioteca Na- 
tional e do Livro, Lisbon, Portugal. 
Ed. Robert P. Holley and others. UB- 
CIM Publications — New Series, vol. 
15. Munchen, New Providence, R.I.: 
K. G. Saur, 1995. 302p. (ISBN 3-598- 
11251-3). 

Normally conference proceedings are 
only the sum of its parts, and a review 
would focus on the contributions of indi- 
vidual papers. This will not be a normal 
review. Instead of reviewing the papers 
presented at the International Federation 
of Library Associations and Information 
Centres (IFLA) satellite meeting for their 
contribution to the intellectual history of 
the field — in this case, classification and 
indexing — this review will concentrate on 
two papers, entitled "Introduction" and 
"Summary." The reason for this becomes 
clear when you realize that this is the 
historic meeting where a list of principles 
underlying subject heading languages was 
brought forth out of a working group 
formed in 1990 and where participants 
from eleven countries described their ex- 
isting national bibliographic systems and 
found, for the most part, that their prac- 
tices were mainly in accord with this list 
of principles. If we are getting closer to 
Universal Subject Control (USC) to 
match the developments in Universal Bib- 
liographic Control (UBC), then this is a 
historic moment. 

"Introduction," by Dorothy McGarry, 
Chairperson, IFLA Section on Classifica- 
tion and Indexing, reported on the pro- 



gress made since the 1990 Stockholm 
meeting of IFLA, where it was decided to 
look into the feasibility of formulating a 
list of principles underlying subject head- 
ing languages used in various subject ac- 
cess systems throughout the world. A 
working group was able to formulate such 
a list of principles, and it was discussed at 
this 1993 meeting and has been discussed 
since then at the 1995 meeting. 

Having the benefit of reviewing the 
papers presented in Lisbon, Julianne 
Beall, in "Summary," was able to conclude 
that there was wide agreement among the 
eleven countries (Brazil, Canada, Croatia, 
France, Germany, Iran, Poland, Portugal, 
Spain, the United Kingdom, and the 
United States) with regard to a number of 
principles underlying subject heading lan- 
guages. Beall s summary presents a draft 
of such principles, and she notes that this 
draft is still very much under discussion 
(p. 292). Further revisions of the princi- 
ples have been made since the publication 
of this book. The latest draft, made avail- 
able by Beall in late 1995, is titled "Prin- 
ciples Underlying Subject Heading Lan- 
guages." This draft preserves almost all of 
the language of the principles presented 
in the book but groups and reorders them. 
In this review the language of the princi- 
ples comes from Beall s summary in the 
book, but their grouping and order of 
presentation comes from the memo. 

It is appropriate to review the entire 
list of principles and document where 
there was less than wide agreement. The 
memo of principles for subject heading 
languages is divided into two sections: 
"Construction Principles" and "Applica- 
tion Principles." 

The section "Construction Principles" 
includes ten principles, four for terminol- 
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ogy control, one for guidance through a 
paradigmatic structure, three for predict- 
ability of representations, one for dynamic 
and documented development, and one 
for audience oriented vocabulary. 

There was wide agreement on four 
closely related principles. The first princi- 
ple, the Uniform Heading Principle 
(which covers both Terminology Control 
and Predictability of Representation), 
states: 

To facilitate synonym control and to collo- 
cate subjects in the display of bibliographic 
records, each concept or named entity that 
is indexed by a subject heading language 
should be represented by one authorized 
heading (p. 292). 

The second principle, the Synonymy 
Principle, states: 

To collocate all material on a given subject 
and to increase the recall power of a sub- 
ject heading language, synonymy should 
be controlled in the subject heading lan- 
guage (p. 292). 

The third principle, the Homonymy 

Principle, states: 

To prevent the retrieval of irrelevant ma- 
terials and to increase the precision power 
of a subject heading language, homonymy 
should be controlled in the subject head- 
ing language (p. 292). 
The seventh principle, the Naming 

Principle, states: 

To facilitate integrated retrieval, names of 
persons, places, families, corporate bodies 
and works when used in a subject heading 
language of a given catalogue, bibliog- 
raphy or index should be established ac- 
cording to the rules used for author and 
title entries in that catalogue, bibliography 
or index (p. 293-94). 

There was less agreement in principle 
and in practice regarding the fourth prin- 
ciple, the Semantic Principle, which 
states: 

To express the semantic (paradigmatic) 
structure of a subject heading language, 
subject headings should be linked by 
equivalence, hierarchical and coordinate 
relationships (p. 294). 
The work done to formulate thesaurus 
construction standards (most notably ISO 
2788 [1986] and ANSI Z39.19-1993) will 
help when this principle is put into prac- 



tice by various national bibliographic sys- 
tems, including the work done at the 
Library of Congress using Library of Con- 
gress Subject Headings (LCSH). Nancy 
Williamson's paper, "Standards and 
Standardization in Subject Analysis in Sys- 
tems: Current Status and Future Direc- 
tions," covers the standards and stand- 
ardization efforts in subject analysis 
systems. 

The fifth principle, the Syntax Princi- 
ple, had the least agreement because of 
the variation in practice when building 
precoordinated strings: 

To express complex and compound sub- 
jects, the syntax of a subject heading lan- 
guage should link the compound parts of a 
subject heading by syntagmatic relation- 
ships rather than semantic (paradigmatic) 
ones (p. 295). 

This principle is listed under "Con- 
struction Principles," but it most as- 
suredly should be moved to the section 
"Application Principles." Maybe that will 
be done in future drafts, when the work- 
ing group realizes the confusion they 
cause by expecting the subject heading 
language structure to produce what can 
only be represented in an indexing record 
that forms part of a document description, 
namely how the facets of a topic are being 
covered in a document. The subject head- 
ing language should accommodate such 
description, but the expression of such an 
indexing record should not be built into 
the language- Other bibliographic systems 
have seen this, but not the Library of 
Congress or those who follow its lead. One 
need only investigate a Medical Subject 
Headings (MeSH) bibliographic record 
and its thesaurus to see this. Here the field 
can benefit from the standardization ef- 
fort in the field of thesaurus construction, 
as Williamson points out. Unfortunately, 
she artificially distinguished between pre- 
coordinate and postcoordinate systems at 
a time when the two are converging in 
online systems. Her paper and Elaine 
Svenonius' paper entitled "Precoordina- 
tion or Not?" leaves a gap of under- 
standing about how these two types of 
subject heading languages have been con- 
verging — witness the provision for sub- 
headings in MeSH, Public Affairs Infor- 
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mation Service (PAIS), and other the- 
sauri. These thesauri are compiled for pe- 
rusal, hut the strings do not form part of 
the thesaurus. Most descriptors and re- 
lated fields of data in the indexing record 
for genre, Ibnn, audience, time, and place 
form something very similar to the LCSH 
string, but in a way that is more control- 
lable for "limit" commands, geographic 
aids, time lines, etc. Most search engines 
and computer-processing techniques de- 
compose precoordinated strings hef'ore 
producing keyword indexes. Experts in 
vocabulary control will need to realize that 
new processing techniques will facilitate 
multiple views of subject heading lan- 
guages and corresponding indexing rec- 
ords in bibliographic databases. How 
computer-based systems handle this in- 
formation should be determined by ex- 
perts in vocabulary control who can see 
the potential of new displays, new com- 
parisons of retrieved sets, and relation- 
ships between indexing terms and free 
text. 

There is widespread agreement on the 
next three principles: 

Consistency Principle: To achieve and 
maintain consistency, each new subject 
heading admitted into a subject heading 
language should be similar in form and 
structure to comparable headings already 
in the language (p. 293). 
The A Posteriori Principle is renamed the 
Literary Warrant Principle in the memo, 
and states: 

To reflect the subject content of docu- 
ments, the vocabulary of a subject heading 
language should be developed dynami- 
cally, based on literary warrant, and inte- 
grated systematically with existing 
vocabulary (p. 297). 

User Principle: To meet users' needs, 
the vocabulary of subject headings in a 
subject heading language should be cho- 
sen to reflect the current usage of the 
target audience for the subject heading 
language, whatever that might be, for ex- 
ample the general public or users of a 
specific type of lihrary (p. 296). 
Lois Mai Chan in her paper on prac- 
tices in the United States reported on 
variations with the last principle at the 
— ss. 



The memo's grouping of two Applica- 
tion Principles leave a lot to be desired 
because they do not specifically treat the 
issue of citation order if a string is to be 
devised from the subject heading lan- 
guage and nothing is said about the differ- 
ence between an enumerative and a syn- 
thetic subject heading language system. 
Clearly, this section of the Application 
Principles is still under revision. The 
memo adds a Subject Indexing Policy 
Principle that does not appear in the book. 
This principle states: 

To meet user needs and give consistent 
treatment to documents, indexing policies 
giving guidance for subject analysis and 
representation should be developed (p. 3). 
The Specificity Principle, renamed the 
Specific Heading Principle, states: 

To increase the precision power of a sub- 
ject heading language, a subject heading or 
a set of subject headings should be coex- 
tensive with the subject content to which 
it appfies (p. 297). 

Although Robert Fugmann presented 
a paper at this conference, it is not clear 
from the proceedings how well it was re- 
ceived. He spoke of mandatory indexing 
appended by free indexing— hybrid index 
languages— and he warned about the 
prospect of fully mechanized, algorithmic 
indexing. Had his words been heeded, the 
list of" principles would have taken this 
more into account. Also, the report of 
significant changes in the British scene, by 
la Mcllwaine, shows how far from these 
principles national subject heading 
schemes can digress. 

Although there is less than universal 
agreement on these principles, and vari- 
ous national systems show a divergence of 
practices, I still hope that the goal of the 
working group wilf be reached— namely, 
to promote understanding of different 
subject heading languages by identifying 
commonalities underlying them and pro- 
viding a structure for their comparative 
study. Efforts have been under way to 
carry out such a study since this confer- 
ence, I hope more than library catalog 
systems will be investigated. The world of 
bibliographic control includes abstracting 
and indexing databases, and their subject 
heading languages are very important if 
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two other goals of the working group are 
to be reached, namely, the provision of a 
statement of what is meant by a good 
Subject Heading Language and the provi- 
sion of a theoretical rationale for particu- 
lar standards or guidelines for Subject 
Heading Language construction and ap- 
plication. 

The working group has moved forward 
since this 1993 meeting, and it has carried 
out a survey of each principle as illustrated 
in statements and examples taken from 
various systems. Texts from sources pub- 
lished in languages other than English 
were translated. This survey should be 
published sometime in the next year. It 
will become another important document 
in the history of internationally accepted 
principles for subject heading languages. 
The working group is to be commended 
for its diligence, steadfastness, and coop- 
erative spirit. 

What impact the group's work will have 
remains to be seen because developments 
in cyberspace might outdistance it very 
soon. Hybrid systems promise to be the 
new standard, with clearly defined dis- 
tinctions between pre- and postcoordi- 
nated systems a thing of the past. Our field 
may still have some impact, however, if we 
are seen to provide structure in the new 
information environment of hypertextual 
displays and graphic user interfaces. — 
Pauline Atherton Cochrane, Graduate 
School of Library and Information Sci- 
ence, University of Illinois, Urbana- 
Champaign 

Academic Libraries as High-Tech 
Gateways: A Guide to Design and 
Space Decisions. By Richard J. Bazil- 
lion and Connie Braun. Chicago: ALA, 
1995. 225p. $36 (ISBN 0-8389-0656- 
7). LC 95-14035. 

Digital libraries are in our future. 
When the wholly digital research library 
will emerge, how aggressively we ought to 
work to achieve it, and how we might best 
do so are matters presently at issue. Cer- 
tainly it is now technologically possible to 
create a true digital library, but a complex 
of psychological, social, legal, and eco- 
nomic barriers require that we proceed 
incrementally. Yet even if the wholly digi- 



tal library were currently within our grasp, 
it is arguable whether it would be a desir- 
able end when one considers the library as 
a physical place for the bringing together 
of intellectual, social, and service agents 
and values to create a whole much larger 
than the sum ofits parts. But assuming the 
wholly digital library will happen at some 
point, and assuming we find it desirable 
from cognitive, social, and financial per- 
spectives, the question remains: How do 
we get from here to there? 

While Academic Libraries as High- 
Tech Gateways does not address this 
question directly, it does bring to the fore 
and at least touches on some important 
issues that afford us the opportunity to 
reflect on ways in which academic librar- 
ies are evolving in response to the digital 
revolution. The objects of study in this 
volume are the libraries of Brandon Uni- 
versity (Manitoha) and Indiana Univer- 
sity-Purdue University in Indianapolis; 
Lilly Library, Earl ham College (Indiana); 
Leavey Library, University of Southern 
California; Wehr Library, Viterbo College 
(Wisconsin); and the Information Arcade, 
University of Iowa Library. All six facilities 
are in various ways innovative in their ad- 
aptation to, or incorporation of, digital 
technology in space, design, and services. 
These libraries might fairly be regarded as 
transitional libraries that collectively have 
taken a major step toward achieving the 
future library. Although Bazillion and 
Braun apparently intend this volume for 
the planner of a new facility, and so focus 
more on the practical aspects of planning 
and decision making than on the idea- 
tional bases for these and future libraries, 
they do provide several object lessons that 
all who contemplate the future of aca- 
demic libraries would do well to consider. 

It is unclear for whom exactly this book 
is intended, however, and it might be an 
uncertainty in the authors' minds that 
leads to a rather stark separation between 
the practical and the ideational. Two 
works in fact seem to be stitched together. 
The first, comprising chapters two, three, 
and four, is a detailed, highly practical 
(and for the hands-on project manager, 
potentially quite valuable) review of issues 
and options related to the building shell, 
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physical plant, space calculations, need 
for flexibility and modularity in planning, 
networking, and wiring, interior design, 
and furniture. Many of the focused discus- 
sions and assessments of choices to be 
made are quite impressive, clearly dis- 
playing considerable experience and 
thoughtful insight on the part of the 
authors. For example, the nearly twenty 
pages cm instructions for and pitfalls in 
calculating shelving, study, and work 
space are of value to all but the most 
experienced veterans of this activity, as are 
the twelve pages on wiring and network- 
ing options. Undoubtedly these chapters 
appeal to librarians with limited planning 
and building experience and perhaps to 
architects with little or no exposure to 
academic libraries or information tech- 
nology. 

The second work, contained in the 
preface, chapter 1, and chapters 5 and 6, 
is decidedly different in character and 
would seem to speak to a very different 
audience, most likely the conceptualizer 
looking for frameworks within which ideas 
might be articulated. Bazillion and Braun 
begin their book with the characterization 
of the facilities at hand as "teaching instru- 
ments" (p. xii) for instruction in electronic 
research skills, noting briefly that the pro- 
liferation of personal laptop computers 
and of arcades and information commons 
constitute "the most important influence 
on library design in the 1990s" (p. xiii). 
But the promising opening theme is sus- 
pended at this point in favor of a didactic 
.survey of information technology trends 
(e.g., CD-ROMs, Internet, electronic 
publishing, copyright, librarian roles) and 
the core chapters on physical plant issues. 
Not until tlie beginning of chapter 5 (p. 
130), where they enumerate eight leading 
characteristics of the teaching library (in- 
cluding ubiquitous power and data, in- 
struction rooms, specially designed ser- 
vice points, and an arcade or a commons 
for the dynamic interaction of worksta- 
tions, users, and staff), do they return to 
the conceptual high ground. Had this lat- 
ter discussion occurred early on, the 
reader would have been able to advance 
through the bulk of the work with in- 
creased understanding of the larger con- 



text for making the broad range of archi- 
tectural and physical-plant decisions that 
are dependent on the overall program- 
matic philosophy and direction of the fa- 
cility. 

Better yet, a lull integration of the two 
distinct portions of die book might have 
addressed a series of important questions 
and examined their practical consequences: 

• How are these new academic library 
facilities integrating paper and elec- 
tronic technologies on the one hand 
and traditional and innovative services 
on the other hand, and what space- 
planning and design trade-offs 
thereby arise for the planner of a new 
facility? 

• To what extent were the original con- 
cepts behind these facilities realized, 
and what trade-offs were made as they 
were built? 

• How has broad access to electronic 
bibliographic and full-text resources, 
convenient entry into and easy naviga- 
tion of local and worldwide networks, 
and ready availability of productivity 
and creativity software, all from a sin- 
gle workstation within a library setting 
(what I would term "holistic comput- 
ing"), compelled us to redefine librar- 
ies and library services? 

• And what are the space and design 
consequences of this development? 
The dichotomy is unfortunate also be- 
cause it precludes an assessment of the six 
facilities that would enable us to extrapo- 
late lessons and advice for the planners of 
the next wave of transitional libraries on a 
conceptual level beyond rather narrow 
physical-plant decisions. As it stands, con- 
cluding discussions on the need for library 
involvement in the curriculum and the 
need for instruction in electronic research 
skills seem out of place and eerily disem- 
bodied in relation to the middle chapters 
and the apparent primary purpose of the 
work. 

Despite some problems with the struc- 
ture of this work, Bazillion and Braun have 
in fact provided us with a worthwhile con- 
tribution to the expanding dialogue on the 
future of academic libraries. While they are 
(legitimately) focused on the progressive li- 
brary of today ratherthan the library of tomor- 
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row, they do in fact stimulate considerable 
thought about what it is in these facilities 
that we ought to scrutinize for our collec- 
tive well-being. For instance, their discus- 
sion on librarians as teachers (p. 136—43) 
points up the need to design our facilities 
with maximum accommodation for in- 
structional opportunities. Their descrip- 
tion of information arcades and commons, 
when we associate it with separate over- 
views of instruction for electronic re- 
search skills and holistic computing, pre- 
cipitate a cluster of questions regarding 
the kind of core public service areas and 
human support systems we need to design 
into academic libraries for the foreseeable 
future. And their confirmation of the con- 
tinuing value of paper collections begs a 
series of questions related to the integra- 
tion of paper and electronic technologies 
across the design, space, and services con- 
tinuum. 

In sum, this remains a good read for 
either the planner or the idea-monger if 
the reader is willing to make a major ac- 
commodation for at least half the work. 
The person contemplating public services 
programs in the next generation of librar- 
ies is not likely to be interested in lighting 
options (p. 70-76), and the project man- 
ager preoccupied with lighting issues is 
not likely to be much concerned with a 
generic syllabus for teaching information 
technology (p. 141-42). But in the end, no 
one thinking about a new academic library 
construction, renovation, or remodeling 
program can afford not to read and to 
learn from this book. — Chris Ferguson, 
heavy Library, University of Southern 
California, Los Angeles 

Managing Internet Information Ser- 
vices. By Cricket Liu and others. Se- 
bastopol, Calif.: O'Reilly, 1994. 630p. 
$29.95 (ISBN 1-56592-062-7). 
O'Reilly released this "Nutshell Hand- 
book" at the end of 1994, and given the 
pace of change on the Internet, it is sur- 
prising that most of it is still useful to 
people who want to provide information 
services on the Internet, even though the 
book devotes too much space to setting up 
Gopher systems and never mentions the 
Netscape web browser. The preface is 



careful to note that the bulk of the book is 
aimed at people who are competent 
UNIX system administrators or "those 
with fairly strong UNIX knowledge," but 
asserts that the two chapters at the begin- 
ning and the two chapters at the end of the 
book are appropriate for "less technical 
people" (p. xxvi). The first chapter gives 
an overview of what the Internet Ls and 
different ways to be connected. The sec- 
ond chapter gives a summary of the kinds 
of services it is possible to set up (finger, 
telnet, mailing lists, FTP [file transfer pro- 
tocol], Gopher, WAIS [Wide Area Infor- 
mation Server], and World Wide Web) 
and offers some ideas on reckoning the 
technical and human requirements for 
running such services. The next twenty-six 
chapters move from general to specific 
with each type of service — with how-tos 
on setting up and administering particular 
software programs that run on computers 
with the UNIX operating system. And 
there are a couple of chapters on ways to 
enhance the security of systems. Systems 
administrators that set up and use as few 
as one or two of these programs (most of 
which are freely available over the In- 
ternet or included in most implementa- 
tions of the UNIX operating system) 
would benefit from having Managing In- 
ternet Information Services on the sys- 
tems administrator's bookshelf along with 
whatever more in-depth documentation 
they can get their hands on, such as the 
documentation that is distributed with 
each program. However, this book pro- 
vides a reassuring step-by-step approach 
that is often missing from program docu- 
mentation. 

The authors conceive the tasks pre- 
sented in the book as typically performed 
by people with two distinct roles: the sys- 
tems administrator in charge of the per- 
formance of the computer and the instal- 
lation and configuration of the programs, 
while the "data librarian" is responsible 
for the organization and presentation of 
the information. In the library world, this 
might mean that systems librarians would 
do the system administration, while the 
data-librarian function would be per- 
formed by catalogers or ref erence librari- 
ans. Twelve of the thirty chapters in the 
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book and parts of other chapters are 
marked, with a small graphic icon of an 
open book, as being particularly appropri- 
ate for the data librarian. The balance of 
the book is aimed at systems administra- 
tors. Notable among the data-librarian 
sections are the final two chapters in the 
book on legal and copyright issues and on 
protecting intellectual property. The no- 
tion of publishing information on the In- 
ternet and then using cryptographic 
methods to keep the information away 
from people who are not your customers 
seems somewhat anathema to the spirit of 
free exchange in the open system design 
of the Internet, itself traditionally embed- 
ded in the research and education envi- 
ronment. But that is what the authors have 
put forth as the main option for protecting 
your intellectual property. 

Regrettably, setting up a Gopher serv- 
er receives too much emphasis in this 
handbook. Although the Gopher material 
(eight chapters and three appendixes) 
might be useful to many people adminis- 
tering systems that are already in place, 
people starting from scratch will likely 
ignore these sections, skipping ahead to 
the chapters on the World Wide Web. 
World Wide Web servers, the authors 
point out, have many advantages over Go- 
pher servers (graphic capabilities and 
more powerful and flexible linking capa- 
bilities). Perhaps a luture edition of the 
book will admit that Gopher is now largely 
superseded by the Web and will devote 
much less space to Gopher. 

The authors do a good job of'explaining 
the client-server dichotomy. Even though 
the book is really about servers and not 
clients, the authors nevertheless explain 
how Web browsers are capable nf being 
clients not only of Web servers but also of 
Gopher and FTP servers. And there are 
thumbnail sketches of several notable 
Web browsers that run on UNIX ma- 
chines, including Lynx, the widely used 
text-only alternative to Netscape or Mo- 
saic. Netscape did not yet exist when the 
book was written, and consequently it isn't 
mentioned. The chapters on the Web pro- 
vide a good introduction to HTML (hy- 
pertext markup language), Web design is- 
sues, clickable image maps, the Common 



Gateway Interface, forms, and, of course, 
setting up a server. The authors provide 
pointers to HTML authoring and conver- 
sion tools available on the Internet. 

Managing Internet Information Ser- 
vices combines general information — 
succinct and understandable definitions 
and explanations of some broad Internet 
concepts — with very specific information 
such as step-by-step help for people set- 
ting up the particular programs high- 
lighted in the book. The Web server soft- 
ware highlighted is the httpd (for 
"hypertext transfer protocol daemon," a 
"daemon" being a program that runs auto- 
matically in the background without hav- 
ing to be invoked by a user or administra- 
tor) program from the National Center for 
Supercomputing Applications (NCSA). 
The FTP server software presented is the 
Washington University Archive FTP 
daemon (WU-ftpd). The database index- 
ing and searching engine discussed is free- 
WAIS. The mail reflector software dis- 
cussed is Majordomo. There is a chapter 
devoted to Xinetd, a drop-in replacement 
with security enhancements for the UNIX 
standard inetd (for "Internet daemon," a 
supers erver that keeps track of multiple 
Internet service programs). And there are 
the chapters on the Gopher daemon. All 
these chapters are silver bullets lor people 
who want to set up these particular pro- 
grams on a UNIX machine. If you don't 
want to get near a UNIX box, or want to 
use programs other than the ones men- 
tioned, the book is still useful for the con- 
ceptual overview and the sections aimed 
at the data librarian. 

But why wouldn't you want to use the 
programs presented? Yes, you could run 
some other Web server software (com- 
mercial or not) on UNIX or Windows NT 
or a Macintosh. But according to the 
authors, "Right now, UNIX is the platform 
best-suited to providing Internet informa- 
tion services: most of the implementations 
of these services are on UNIX. Most ser- 
vices support delivery to a number of plat- 
forms, including PC's and Macs, but those 
computers don't yet have the speed or 
sophistication to handle hosting a full- 
blown information service" (p. 6). UNIX 
is the de facto standard operating system 
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for computers providing information on 
the Internet, and the programs presented 
are themselves virtually standard. They 
are widely used, easy to obtain, and rela- 
tively well supported. (Programs that are 
freely available are usually provided as is, 
without any guarantee, and without tele- 
phone support. They are well supported 
by user groups on the Internet accessible 
through USENET news, etc.). Arguably, 
this view of the Internet, with UNIX as the 
preeminent operating system, and freely 
available programs presented as the 
standard service programs, might be less 
defensible in late 1995 than it was when 
this handbook was written. UNIX might 
or might not keep its place as the preemi- 
nent Internet operating system; the stand- 
ard programs presented might be aug- 
mented or replaced by other 
programs. — ]ohn P. Edwards, Teachers 
College, Columbia University 

Digital Libraries '94: The First An- 
nual Conference on the Theory and 
Practice of Digital Libraries, June 
19-21, 1994, College Station, 
Texas. Ed. J. L. Schnase and others. 
Hypermedia Research Laboratory, 
Texas A & M Univ., 1994. 221p. 
This book is a compilation of papers 
presented at the First Annual Conference 
on the Theory and Practice of Digital Li- 
braries held in College Station, Texas, in 
June of 1994. Since then, of course, a 
second conference has been held, and the 
proceedings of that conference are also 
available in a separate publication. Both 
these proceedings and the 1995 proceed- 
ings can also be accessed online at URL 
http://bush.cs.tamu.edu. 

The genesis for the 1994 conference 
was the 1993 solicitation from the 
National Science Foundation/Advanced 
Research Projects Agency/National Aero- 
nautics and Space Administration (NSF/ 
ARPA/NASA) that invited universities and 
their partners in the scholarly and com- 
mercial sectors to submit proposals for 
projects that would implement some as- 
pect of the digital library. The solicitation 
was deliberately vague so as to encourage 
maximum creativity in proposals. Because 
only a few of the many proposals received 



were funded, the digital libraries confer- 
ence was organized as a forum for re- 
searchers to air the many diverse ideas 
contained in all the proposals. Conse- 
quently, this book contains papers that 
describe many of the projects for which 
proposals were submitted. To ensure the 
quality of papers presented, the editors 
report that each received at least three 
reviews. The result is a high-quality vol- 
ume of conf erence proceedings. 

The book contains twenty-nine schol- 
arly papers and thirteen position state- 
ments — short one- or two-page papers 
describing digital library research. The 
authors of the N S F/ARPA/N AS A solicita- 
tion had hoped that a very wide range of 
proposals would be generated. The papers 
in these proceedings reflect exactly that. 
Rather than coalescing around a single 
vision of the digital library, they reflect a 
very wide diversity of ideas and research. 
These proceedings make it clear that, at 
the time the papers were written, digital 
library research was concerned mostly 
with the pieces that might one day lead to 
the digital library. The papers are highly 
diverse, and it is difficult to find common 
themes that unite them. Probably for that 
reason, the editors made no attempt to 
organize the papers by topic. Some papers 
do not fit neatly into a single category but 
straddle several. Nevertheless, as diverse 
as they are, some common strands can be 
gleaned from the papers. 

One group of papers examines the 
digital library broadly. The authors ask the 
questions: 

• What is a digital library? 

• What are the assumptions underlying 
the digital library? 

• What kinds of digital libraries are 
there (taxonomies)? 

• How are they organized? 

Six papers deal with these questions, 
ranging from the philosophical (why 
should a digital library be called a library?) 
to the practical (what architectures are 
most suitable for implementation?). Areas 
of research that need to be addressed are 
also discussed in these papers. 

A second group of papers examines 
particular test-bed implementations of 
digital libraries such as in particular insti- 
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tutions or in particular topical areas. Nine 
papers deal with this aspect. Here too the 
papers are diverse. Some deal with 
scalability of the digital library in a global 
environment. Others look at implementa- 
tion in particular, specialized areas such as 
forensic medicine and botanical hyperme- 
dia information systems. Yet others dis- 
cuss support for manipulation of large 
quantities of statistical data and use of the 
digital library to support science teaching. 
It is clear from these papers that a great 
deal of work is being done in testing ideas 
and models in test-bed initiatives. 

The issue of the human-machine inter- 
face for a digital library is addressed by 
four articles. Two deal with interface is- 
sues as they apply to the scalability of the 
digital libraries with very large collections. 
How can informational content be organ- 
ized in large collections so that users can 
identify and select appropriate sources? 
Others deal more with issues from the 
users perspectives— how should users 
best interact with the digital library? 

Seven papers deal with issues of or- 
ganization of information within digital 
libraries. Once again, the topics within 
this group are very diverse. They deal with 
data compression and indexing; storage 
and retrieval of videos; finding syntactic 
and semantic relationships in digital docu- 
ments; using "community memory" to join 
large-scale digital libraries with the activi- 
ties of community members; and using 
linguistic ontologies to enhance retrieval 
from large digital databases. 

One ni die thorny issues, of course, is 
the relationship between publishing and 
libraries in a digital environment. Many of 
the unsolved problems of implementing 
digital libraries lie in this area. Several of 
the papers presented deal with this in 
passing. However, one paper takes a more 
in-depth look at this future relationship 

through a description of Project ELVYN 

a project that links the Institute of Physics 
Publishing with a number of academic li- 
braries for the purpose of testing a new 
model of information delivery from publish- 
ers to libraries. 

Finally, three papers look at the digital 
library within the context of intelligent 
agents of various sorts that will shape the 



digital library and will allow users un- 
precedented access to information. There 
is wide recognition that the digital envi- 
ronment both requires and is amenable to 
"intelligent" tools for information access. 
The research reported here looks at three 
very different projects — intelligent access 
in a K-12 environment, use of agents for 
retrieval of digital images, and knowl- 
edge-based retrieval from heterogeneous 
information sources. 

The proceedings of this conference 
were a landmark in the sense that they 
brought together many of the major play- 
ers and ideas in what is truly a multidisci- 
plinary field. The list oi* both individual 
and institutional participants in this re- 
search is very impressive. However, the 
proceedings also highlight the enormous 
amount of work, both in research and im- 
plementation, that needs to he done be- 
fore we can truly point to working models 
of the digital library. The very diversity of 
the papers indicates that the path to the 
digital library is a highly complex one. 
By bringing together so many re- 
searchers in so many different fields, the 
prospect of real progress has been in- 
creased considerably. These proceedings 
and those of the 1995 conference are es- 
sential introductions to current thought 
and research into the digital library of the 
future.— Peter Liebscher, Palmer School 
of Library and Information Science, Long 
Island University 

Format Integration and Its Effect on 
Cataloging, Training, and Systems: 
Papers Presented at the ALCTS 
Preconference, "Implementing US- 
MARC Format Integration," 
American Library Association An- 
nual Conference, June 26, 
San Francisco, California. Ed. 
Karen Coyle. ALCTS Papers on Li- 
brary Technical Services and Collec- 
tions, no. 4. Chicago: ALA, 1993. HOp 
(ISBN 0-8389-3432-3). LC 93-19721. 
"The goal of Format Integration is the 
creation of a single USMARC biblio- 
graphic format that provides the complete 
range of content designation for all types 
of materials and in which all information 
of the same type is identified by the same 
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content designation. Format Integration 
provides for the communication of re- 
cords for complex items whose descrip- 
tions may include serial, archival control 
and/or multiple material type aspects" (p. 
11). Thus begins Patton ana Weiss' contri- 
bution to Karen Coyle 's edited volume 
published in 1993. This book is the fruit 
of a Machine- Readable Bibliographic In- 
formation Committee (MARBI) precon- 
ference on format integration held before 
the 1992 ALA Conference in San Fran- 
cisco, California. Authors of the chapters 
were speakers at the preconference, com- 
ing from varying but related professional 
fields of endeavor, each one well versed 
on the intricacies of format integration: a 
programmer, a technical services librar- 
ian, a network trainer, a Library of Con- 
gress representative from the Network 
Development and Machine -Readable 
Cataloging (MARC) Standards Office, an 
OCLC Online Computer Library Center, 
Inc., (OCLC) product specialist, and sev- 
eral systems librarians. 

Beginning with a brief introduction, 
Coyle, technical specialist for the Univer- 
sity of California's MELVYL system, re- 
minds us that the authors of the articles 
have had no real experience with format 
integration, but assures us that "... For- 
mat Integration is a relatively simple task" 
(p. viii). Next, Coyle offers an explanation 
of the technical setup. As each USMARC 
tag is first mentioned in each chapter, the 
name of the tag is given; "... field 246 
(Varying Form of Title)" (p, ix). Following 
these preliminaries, the authors individu- 
ally launch into chapter topics that cover 
a full range of implementation considera- 
tions: an overview of format integration, 
the effect of format integration on cata- 
loging, the treatment of monographic, 
multimedia, and serials materials, train- 
ing, documentation concerns, the vital 
roles of the utilities, the impact of format 
integration on local systems, and the end 
result on online public access catalogs. 

As the impact on various formats is 
discussed in chapters entitled "Mono- 
graphic Materials," "Multimedia Materi- 
als," and "Serials," examples of MARC 
records before and after format integra- 
tion are given, each preceded by a concise 



explanation of what has occurred. There 
are many examples of sample biblio- 
graphic records created with an inte- 
grated MARC format. Following the ex- 
amples, the text comes full circle through 
discussions of how training, documenta- 
tion, utilities, local systems, and online 
public access catalogs will be affected 
through implementation. The interrela- 
tion of these five categories is obvious as 
one topic easily leads into the next. 

The appendix consists of charts com- 
paring the display of the leader, 006 field, 
and the 008 field of OCLC, the Research 
Libraries Information Network, and the 
Western Library Network (WLN). A glos- 
sary and acronyms follow. An added index 
of MARC fields is included at the end. 

Readers might want to compare parts 
of this book with the "Special Section: 
Format Integration" in the June 1990 In- 
formation Technology ami Libraries that 
features papers based on presentations 
made at the MARBI program on format 
integration at the 1989 ALA Annual Con- 
ference in Dallas, Texas. The types of 
changes, the handling of serials and mixed 
media, coordinating the implementation, 
and applying format integration are dis- 
cussed in an equally clear and succinct man- 
ner, complete with examples of MARC rec- 
ords, although not as extensive and without 
the before-after comparisons. 

It is much easier now than two years 
ago to obtain up-to-date information on 
the implementation of format integration. 
With electronic resources so readily avail- 
able, a wealth of information is available. 
Now it is possible, with only a few clicks 
of a mouse button, to go directly to the 
World Wide Web pages and peruse not 
only technical services of individual librar- 
ies, but also those of the utilities. 

In September 1995 the Library of Con- 
gress announced through its USMARC 
home page (http^/lcweb. loc.gov/marc/) 
that it should have its system development 
work finished for the final (1995) phase by 
March 1, 1996. On October 13, 1995, 
OCLC announced through its home page 
( http ://www. oclc.org/oclc/press/95 1013. 
htm) that it hopes to be finished with its 
implementation of the final phase by 
March 3, 1996, a date agreed upon after 
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consultation with the Library of Congress, 
the National Library of Canada, ISM Li- 
brary Information Services, the Research 
Library Group, and the Western Library 
Network. 

Format integration is now reality. Cer- 
tainly, a major concern now has to be the 
reaction of system vendors toward imple- 
mentation. Have the vendors' reactions to 
format integration implementation been 
slow (or nonexistent)? Or have the vendors 
taken swift and timely action toward full 
implementation? It is indeed frustrating, 
for instance, for catalogers who find it nec- 
essary to work around systems that are not 
fully implemented in format integration. 

Now that the final phase of implemen- 
tation is to be completed by early 199fi, it 
behooves all persons involved with format 
integration to take another look at this 
gem of a handbook. No doubt the subject 
matter will be better understood now than 
it was in 1993, when the book was first 
published, although in 1996, as in 1993, 
librarians are still grappling with how their 
libraries might be affected by format inte- 



gration. Regardless, through use of this 
book along with the classic Format Inte- 
gration and Its Effect on the USMARC 
Bibliographic Format, now available in a 
third edition, the planning for and imple- 
mentation of format integration should 
unfold as Coyle so precisely stated in her 
chapter entitled "Online Public Access 
Catalogs": "Format Integration repre- 
sents an evolution of the USMARC for- 
mats, not a revolution If Format Inte- 
gration truly is successful, the users never 
will know it happened" (p. 98). — Kathleen 
Sparkman, Library Technical Services, 
Cataloging, Baylor University 
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Doris Hargrett Clack, 1928-1995: 
Educator, Gentle Activist, 
and Mentor 

Alva T. Stone 



T 

JL he future is longer than the past!" she 
used to tell us in class. It was Doris Clack s 
first year as a Cull-time prof essor, twenty- 
two years ago, at the Florida State Univer- 
sity School of Library Science. Back then 
the required beginning course, LIS 521— 
Cataloging & Classification, was a six- 
credit-hour course, and FSU was stiil un- 
der the quarter system. There were more 
than a few of us in that class who had never 
planned to be catalogers. But Clacks en- 
thusiasm was infectious. She made us un- 
derstand how the library catalog was a 
product of lasting substance created solely 
by librarians; the other services provided 
hy librarians, while certainly of great 
value, were by nature more intangible and 
fleeting in significance. If catalogers are 
conscientious and do their work well, they 
not only leave a permanent record of their 
efforts, but they also perform a service 
that is essential to support the reference 
staff's fulfillment of our library patrons' 
needs. There was a lot of theory discussed 
in her lectures, not just the practitioner's 
rules. We might not have believed it then, 
but in later years we discovered that the 
theoretical foundation that had been 
drummed into us would come back to aid 
us at times when we had to face challeng- 
ing questions related to cataloging or clas- 
sification. 

When Doris Clack died of cancer, on 
November 22, 1995, some people were 
shocked to learn that she was sixty-seven 



years old. She had always been so elegant, 
tall and fashionably dressed, with a milk- 
chocolate complexion, a beautiful smile, 
and regal posture. It was not the first time 
that I'd heard expressions of surprise 
upon the discovery of our beloved profes- 
sor's real age. Back in 1974, Clack's gradu- 
ate assistant, Michele Newberry Dalehite, 
was working at library school one day 
when one ol our fellow students passed by 
and reported excitedly to Michele that he 
had seen Clack's dissertation from the 
University of Pittsburgh, and that, accord- 
ing to information in it, she was forty-five 
years old! The word spread around the 
place, and everyone was stunned. We 
swore she didn't look a day over thirty, and 
she certainly didn't act like some old, mid- 
die-aged person. Well, she was serious 
about cataloging and firm, even strict, in 
teaching us the discipline. But she also 
seemed very earing, and she followed the 
fashion trends, and — dare I say it? — 
somebody saw her at ALA onee when she 
had her hair in an Afro! Groovy! 

The graduate assistant, Michele, took 
all of the courses that Clack offered, and 
later, with a fresh M .L.S . degree, was hired 
by a fledgling organization in Atlanta called 
SOLINET. Over the next several years 
Michele became known all over the South- 
east as she trained catalogers in the MARC 
format and this new online cataloging net- 
work, OCLC. Michele has done other work 
since then, has continually been involved in 
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professional associations, and just last year 
assumed the presidency of the Library 
and Information Technology Association 
(LITA)! Doris Clack was so proud of 
Michele, as she was of all her former stu- 
dents who went on to make their own 
contributions to the profession. It pleased 
her to think, and she was not mistaken in 
this, that she may have had a part in inspir- 
ing the achiever to reach for success. 

No one would deny that Clack ap- 
proached her calling as an educator with 
a sense of dedication that seemed almost 
boundless. In addition to presenting her 
best effort to degree-seeking students, 
Clack took on responsibility for continu- 
ing education of catalogers all over North 
Florida and surrounding regions. She was 
good, And when they asked her to speak, 
to help train them, she could never say no, 
There were workshops on Anglo-Ameri- 
can Cataloguing Rules (AACR) revised 
chapter 6 and, later, on AACR2, on the 
Dewey Decimal Classification 19 and 20, 
on filing rules, on MARC formats, and on 
cataloging for nonbook materials, micro- 
computer software, or archives and manu- 
scripts. Practitioners assembled in Talia- 
liassee, Gainesville, Panama City, Miami, 
Tampa, Orlando, Daytona, or St. 
Augustine (in Florida), and in Birming- 
ham, Alabama, or Jackson, Mississippi, to 
learn from Doris Clack the latest rules and 
practices they needed to be effective in 
their jobs. People still talk about the 
March 1979 International Conference on 
AACR2 that was held in Tallahassee and 
organized by Clack. This was what we now 
call "cutting edge" stuff, with AACK2 just 
hot off the presses and speakers like 
Michael Gorman, Ben Tucker, Ronald 
Hagler, et al. Participants were 228 li- 
brarians from all over the United States, 
and from Canada, Great Britain, and 
Puerto Rico. Seven years later, Clack rec- 
ognized the significance of another new 
development when she invited the Li- 
brary of Congress' Mary K. D. Pietris to 
Tallahassee to introduce catalogers to the 
Subject Cataloging Manual: Subject 
Headings. It was 1986, and that event, 
called the "Workshop on Subject Access 
in Libraries," also featured a young 
scholar whom Clack had characterized as 



a "rising star" in the profession, Karen 
Markey (now Karen M. Drabenstott). 

When the university changed to the 
semester system, in the early 1980s, the 
required cataloging course became a four- 
hour, one-semester class. This meant 
fewer total class contact hours to cover 
basic theory and practice, while at the 
same time, new issues and trends had to 
be addressed, such as MARC, the biblio- 
graphic utilities, retrospective conversion, 
and new catalog formats (COM and on- 
line). Itwas difficult making the necessary 
adjustments to scope and extent in the 
course content. Clack began to develop a 
reputation as a demanding professor, and 
the course was feared by some students 
who didn't want to be challenged in a class 
that they had already decided they would 
not like. Nevertheless, after some news 
about Doris Clack's death was posted on the 
Internet, more than a dozen unsolicited re- 
sponses were received from appreciative 
and admiring former students as far away 
as Indiana and California. 

"Doris Clack was such an enthusiastic 
teacher and such an encourager," said one 
message. Another person admitted that 
"Clack really had to work with me to help 
my poor brain understand cataloging . . . 
she was a wonderf ul lady and she will live 
in my memories, with a smile ." In a similar 
vein: "Although I didn't excel at catalog- 
ing, she made it an interesting course, and 
1 have always admired her accomplish- 
ments and character." And "I was far from 
her most distinguished student, but there 
was no way I could not like, respect, and 
admire her. She was a great woman and a 
great lady." One final tribute, from a col- 
league in South Florida: "I had known she 
was ill for quite some time, but it still hurts 
to know that one of the greats at FSU is 
no longer with us. ... 1 can honestly say 
that she inspired me to become a compe- 
tent cataloger. ... I no longer catalog since 
I am now a manager. But her lessons about 
quality and caring about what is truly im- 
portant will always remain with me." She 
made scholarly contributions to our field, 
as wel!, which have inlluenced teachers, 
students, and practitioners who never had 
the fortune of meeting her. Among the 
books authored or edited by Clack, at least 
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two have been published by the American 
Library Association: Authority Control: 
Principles, Applications and Instructions 
(1990) and The Making of a Code: The 
Issues Underlying AACR2 (1980). An- 
other book, Black Literature Resources: 
Analysis and Organization (1975), pub- 
lished by M. Dekker, became an essential 
aid to collection analysis in the area of 
black studies, as well as a tool for evaluat- 
ing possible prejudice or biases in the 
subject beading and class iheation treat- 
ments of racial minorities. Many of her 
articles (at least ten) on these and related 
subjects were published in professional 
journals, particularly Library Resources 6 
Technical Services and Cataloging 6- 
Classification Quarterly. 

Her record of service in professional 
organizations was staggering. A member 
of many executive boards and commit- 
tees, at various times she also served as 
chair of the Cataloging and Classification 
Section ol the Association for Library Col- 
lections & Technical Services (ALCTS), 
its Subject Analysis Committee (SAC) its 
Council of Regional Croups, and the 
Association tor College & Research Li- 
braries Afro-American Studies Librarians 
Section. This ALA-related activism 
prompted several of her colleagues from 
other universities to share the following 
sentiments. "I was shocked to hear of 
Doris Clacks passing. ... My, this is ouite 
a loss." "She was a most conscientious 
professional and a congenial person." "I 
got to know her mostly when she was chair 
of SAC, and I developed a lot of respect 
for her — a real professional but always 
considerate and kindly" 

This manner ofhers, the ability to cou- 
ple a businesslike intellect with human 
compassion, did not go unnoticed at Flor- 
ida State University. Not only was she 
elected twice as the School of Library and 
Information Science (SLIS) repre- 
sentative to the Faculty Senate, she was 
also called upon to serve on some of the 
most difficult university committees, in- 
cluding search committees for new deans 
of the SLIS, of the College of Education, 
and ol Graduate Studies. 

Amazingly, this remarkable professor 
also found time to be active in her com- 



munity and in her church. In the 1970s 
she was vice president of the Parent- 
Teacher Association at her children's 
school, and she was the den mother for 
Boy Scout Pack 87 fbr four years! Shortly 
alter I had completed my M.L.S., I heard 
her mention the Boy Scouts, and I asked 
her how on earth she could find time lor 
that. She said that she'd decided to make 
the time for it, because it was important 
that she be involved and spend time with 
her two sons. This was when I learned that 
the two boys had been fairly young when 
she'd gone to Pittsburgh to work toward 
her Ph.D. They had remained with their 
lather, Doris's devoted husband, Harold 
Clack, in Tallahassee. After seven years as 
a high-schoo! teacher, followed by four- 
teen years as Head of Cataloging and then 
Head of Technical Services at Florida A & M 
University Library, and the birth of two 
sons, here was this woman at the brink of 
middle age, going up north to work on 
another academic degree. "I had to do it," 
she told me, and she mentioned that 
sometimes it's necessary to make sacri- 
fices to accomplish your true goals in life, 
when you finally realize what your voca- 
tion should be. The women's liberation 
movement, around this time, was paying 
a lot of lip service to the notions that 
women could do anything men could do 
and that women have the right to be 
strong and to determine their own desti- 
nies. Clack did not talk about it. She did 
it. And that was the most enduring lesson, 
perhaps, that she taught to me and many 
other students, of both sexes. 

Doris was an African-American. Her 
ethnic heritage was an important part of 
her identity. This may not have been ob- 
vious to her students, because in class she 
was always focused on the purpose and 
re;tlities of Cataloging and classification. 
But she had many commitments and in- 
volvements, with the university's Black 
Student Union, the campus Equal Oppor- 
tunity committees, andM. L. King Distin- 
guished Service awards, as well as a long 
and extensive service record in the Black 
Caucus of ALA. During the last decade of 
her life, Clack found a way to combine her 
talent and skill as a teacher with her inter- 
est in her ancestors' homelands. She spent 
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a year in Nigeria teaching in the Depart- 
ment of Library Science, University of 
Maiduguri. During that year, she visited, 
did consultant work, and gave lectures 
about information organization and about 
technology in libraries in cities and towns 
all over Nigeria. Six years later, in 1994, 
she was able to return to Africa, this time 
to Ghana, where she made speeches at the 
University of Ghana and also before the 
Council of Catalogers in Academic 
Libraries. And only six months before her 
death, she went to Uganda to lecture at a 
workshop on authority control. At her fu- 
neral service last November, we were told 
that everyone in West Africa who came to 
know her loved her, and this I do believe. 
One friend in Morso, Ghana, even named 
her daughter after her, Doris Clack 
Donker. 

Clack was ordained as a deaconess in 
the Bethel Missionary Baptist Church in 
Tallahassee in 1995. Eventually she found 
time for all things important, didn't she? I 
will never forget how she used to tell us in 
class, "The future is longer than the past!" 
Oh, I know . . . the context always had 
something to do with Cataloging. It could 
have been a decision to change the 
library's whole collection over to the LC 
Classification. It might have been a revi- 
sion to rules for forms of names, one 



which would make access to materials 
more "user-friendly." Or, it may have been 
the effort we later had to make to learn 
and implement that complicated MARC 
format, while we were still filing cards and 
that "online" catalog seemed like some- 
body's pipe dream. "The future is longer 
than the past!" It was an expression so 
positive, forward looking, and full of hope 
that I have been able to use it to help guide 
me, not only in my professional career, but 
also in my personal life, when I have had 
to make a commitment to change. I know 
that I am not alone when I say, Dr. Clack, 
you were a wonderful educator, a gentle 
but effective activist, and, above all, a 
strong role model, or mentor — in short, 
an inspiration to us all! 

Memorial contributions are being ac- 
cepted to establish a cataloging scholar- 
ship; donors should write "Clack Memo- 
rial" on the memo line of checks made 
payable to the FSU Foundation and mail 
these to Dean Jane B. Robbins, School of 
Library and Information Studies, Florida 
State University, Tallahassee, FL 32306- 
2048. Another memorial fund supports 
the Bethel Christian Academy, for which 
donations may be mailed to: Bethel Mis- 
sionary Baptist Church, 224 N. Martin 
Luther King Jr. Blvd., Tallahassee, FL 
32301. 
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